Search | arXiv e-print repository

A Causal Inference Framework for Data Rich Environments

Authors: Alberto Abadie, Anish Agarwal, Devavrat Shah

Abstract: We propose a formal model for counterfactual estimation with unobserved confounding in "data-rich" settings, i.e., where there are a large number of units and a large number of measurements per unit. Our model provides a bridge between the structural causal model view of causal inference common in the graphical models literature with that of the latent factor model view common in the potential out… ▽ More We propose a formal model for counterfactual estimation with unobserved confounding in "data-rich" settings, i.e., where there are a large number of units and a large number of measurements per unit. Our model provides a bridge between the structural causal model view of causal inference common in the graphical models literature with that of the latent factor model view common in the potential outcomes literature. We show how classic models for potential outcomes and treatment assignments fit within our framework. We provide an identification argument for the average treatment effect, the average treatment effect on the treated, and the average treatment effect on the untreated. For any estimator that has a fast enough estimation error rate for a certain nuisance parameter, we establish it is consistent for these various causal parameters. We then show principal component regression is one such estimator that leads to consistent estimation, and we analyze the minimal smoothness required of the potential outcomes function for consistency. △ Less

Submitted 2 April, 2025; originally announced April 2025.

arXiv:2502.05340 [pdf, other]

Robust valuation and optimal harvesting of forestry resources in the presence of catastrophe risk and parameter uncertainty

Authors: Ankush Agarwal, Christian Ewald, Yihan Zou

Abstract: We determine forest lease value and optimal harvesting strategies under model parameter uncertainty within stochastic bio-economic models that account for catastrophe risk. Catastrophic events are modeled as a Poisson point process, with a two-factor stochastic convenience yield model capturing the lumber spot price dynamics. Using lumber futures and US wildfire data, we estimate model parameters… ▽ More We determine forest lease value and optimal harvesting strategies under model parameter uncertainty within stochastic bio-economic models that account for catastrophe risk. Catastrophic events are modeled as a Poisson point process, with a two-factor stochastic convenience yield model capturing the lumber spot price dynamics. Using lumber futures and US wildfire data, we estimate model parameters through a Kalman filter and maximum likelihood estimation and define the model parameter uncertainty set as the 95% confidence region. We numerically determine the forest lease value under catastrophe risk and parameter uncertainty using reflected backward stochastic differential equations (RBSDEs) and establish conservative and optimistic bounds for lease values and optimal stopping boundaries for harvesting, facilitating Monte Carlo simulations. Numerical experiments further explore how parameter uncertainty, catastrophe intensity, and carbon sequestration impact the lease valuation and harvesting decision. In particular, we explore the costs arising from this form of uncertainty in the form of a reduction of the lease value. These are implicit costs that can be attributed to climate risk and will be emphasized through the importance of forestry resources in the energy transition process. We conclude that in the presence of parameter uncertainty, it is better to lean toward a conservative strategy reflecting, to some extent, the worst case than being overly optimistic. Our results also highlight the critical role of convenience yield in determining optimal harvesting strategies. △ Less

Submitted 7 February, 2025; originally announced February 2025.

arXiv:2410.02091 [pdf]

The Impact of Generative AI on Collaborative Open-Source Software Development: Evidence from GitHub Copilot

Authors: Fangchen Song, Ashish Agarwal, Wen Wen

Abstract: Generative artificial intelligence (AI) enables automated content production, including coding in software development, which can significantly influence developer participation and performance. To explore its impact on collaborative open-source software (OSS) development, we investigate the role of GitHub Copilot, a generative AI pair programmer, in OSS development where multiple distributed deve… ▽ More Generative artificial intelligence (AI) enables automated content production, including coding in software development, which can significantly influence developer participation and performance. To explore its impact on collaborative open-source software (OSS) development, we investigate the role of GitHub Copilot, a generative AI pair programmer, in OSS development where multiple distributed developers voluntarily collaborate. Using GitHub's proprietary Copilot usage data, combined with public OSS repository data obtained from GitHub, we find that Copilot use increases project-level code contributions by 5.9%. This gain is driven by a 2.1% increase in individual code contributions and a 3.4% rise in developer coding participation. However, these benefits come at a cost as coordination time for code integration increases by 8% due to more code discussions enabled by AI pair programmers. This reveals an important tradeoff: While AI expands who can contribute and how much they contribute, it slows coordination in collective development efforts. Despite this tension, the combined effect of these two competing forces remains positive, indicating a net gain in overall project-level productivity from using AI pair programmers. Interestingly, we also find the effects differ across developer roles. Peripheral developers show relatively smaller gains in project-level code contributions and face a higher increase in coordination time than core developers, likely due to the difference in their project familiarity. In summary, our study underscores the dual role of AI pair programmers in affecting project-level code contributions and coordination time in OSS development. Our findings on the differential effects between core and peripheral developers also provide important implications for the structure of OSS communities in the long run. △ Less

Submitted 8 July, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

arXiv:2402.11652 [pdf, other]

Doubly Robust Inference in Causal Latent Factor Models

Authors: Alberto Abadie, Anish Agarwal, Raaz Dwivedi, Abhin Shah

Abstract: This article introduces a new estimator of average treatment effects under unobserved confounding in modern data-rich environments featuring large numbers of units and outcomes. The proposed estimator is doubly robust, combining outcome imputation, inverse probability weighting, and a novel cross-fitting procedure for matrix completion. We derive finite-sample and asymptotic guarantees, and show t… ▽ More This article introduces a new estimator of average treatment effects under unobserved confounding in modern data-rich environments featuring large numbers of units and outcomes. The proposed estimator is doubly robust, combining outcome imputation, inverse probability weighting, and a novel cross-fitting procedure for matrix completion. We derive finite-sample and asymptotic guarantees, and show that the error of the new estimator converges to a mean-zero Gaussian distribution at a parametric rate. Simulation results demonstrate the relevance of the formal properties of the estimators analyzed in this article. △ Less

Submitted 29 October, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

arXiv:2312.16307 [pdf, other]

Incentive-Aware Synthetic Control: Accurate Counterfactual Estimation via Incentivized Exploration

Authors: Daniel Ngo, Keegan Harris, Anish Agarwal, Vasilis Syrgkanis, Zhiwei Steven Wu

Abstract: We consider the setting of synthetic control methods (SCMs), a canonical approach used to estimate the treatment effect on the treated in a panel data setting. We shed light on a frequently overlooked but ubiquitous assumption made in SCMs of "overlap": a treated unit can be written as some combination -- typically, convex or linear combination -- of the units that remain under control. We show th… ▽ More We consider the setting of synthetic control methods (SCMs), a canonical approach used to estimate the treatment effect on the treated in a panel data setting. We shed light on a frequently overlooked but ubiquitous assumption made in SCMs of "overlap": a treated unit can be written as some combination -- typically, convex or linear combination -- of the units that remain under control. We show that if units select their own interventions, and there is sufficiently large heterogeneity between units that prefer different interventions, overlap will not hold. We address this issue by proposing a framework which incentivizes units with different preferences to take interventions they would not normally consider. Specifically, leveraging tools from information design and online learning, we propose a SCM that incentivizes exploration in panel data settings by providing incentive-compatible intervention recommendations to units. We establish this estimator obtains valid counterfactual estimates without the need for an a priori overlap assumption. We extend our results to the setting of synthetic interventions, where the goal is to produce counterfactual outcomes under all interventions, not just control. Finally, we provide two hypothesis tests for determining whether unit overlap holds for a given panel dataset. △ Less

Submitted 13 February, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

arXiv:2307.01357 [pdf, other]

Adaptive Principal Component Regression with Applications to Panel Data

Authors: Anish Agarwal, Keegan Harris, Justin Whitehouse, Zhiwei Steven Wu

Abstract: Principal component regression (PCR) is a popular technique for fixed-design error-in-variables regression, a generalization of the linear regression setting in which the observed covariates are corrupted with random noise. We provide the first time-uniform finite sample guarantees for (regularized) PCR whenever data is collected adaptively. Since the proof techniques for analyzing PCR in the fixe… ▽ More Principal component regression (PCR) is a popular technique for fixed-design error-in-variables regression, a generalization of the linear regression setting in which the observed covariates are corrupted with random noise. We provide the first time-uniform finite sample guarantees for (regularized) PCR whenever data is collected adaptively. Since the proof techniques for analyzing PCR in the fixed design setting do not readily extend to the online setting, our results rely on adapting tools from modern martingale concentration to the error-in-variables setting. We demonstrate the usefulness of our bounds by applying them to the domain of panel data, a ubiquitous setting in econometrics and statistics. As our first application, we provide a framework for experiment design in panel data settings when interventions are assigned adaptively. Our framework may be thought of as a generalization of the synthetic control and synthetic interventions frameworks, where data is collected via an adaptive intervention assignment policy. Our second application is a procedure for learning such an intervention assignment policy in a setting where units arrive sequentially to be treated. In addition to providing theoretical performance guarantees (as measured by regret), we show that our method empirically outperforms a baseline which does not leverage error-in-variables regression. △ Less

Submitted 4 August, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

arXiv:2306.13681 [pdf, other]

Estimating the Value of Evidence-Based Decision Making

Authors: Alberto Abadie, Anish Agarwal, Guido Imbens, Siwei Jia, James McQueen, Serguei Stepaniants

Abstract: Business/policy decisions are often based on evidence from randomized experiments and observational studies. In this article we propose an empirical framework to estimate the value of evidence-based decision making (EBDM) and the return on the investment in statistical precision. Business/policy decisions are often based on evidence from randomized experiments and observational studies. In this article we propose an empirical framework to estimate the value of evidence-based decision making (EBDM) and the return on the investment in statistical precision. △ Less

Submitted 9 September, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

arXiv:2303.14226 [pdf, other]

Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions

Authors: Abhineet Agarwal, Anish Agarwal, Suhas Vijaykumar

Abstract: Consider a setting where there are $N$ heterogeneous units and $p$ interventions. Our goal is to learn unit-specific potential outcomes for any combination of these $p$ interventions, i.e., $N \times 2^p$ causal parameters. Choosing a combination of interventions is a problem that naturally arises in a variety of applications such as factorial design experiments, recommendation engines, combinatio… ▽ More Consider a setting where there are $N$ heterogeneous units and $p$ interventions. Our goal is to learn unit-specific potential outcomes for any combination of these $p$ interventions, i.e., $N \times 2^p$ causal parameters. Choosing a combination of interventions is a problem that naturally arises in a variety of applications such as factorial design experiments, recommendation engines, combination therapies in medicine, conjoint analysis, etc. Running $N \times 2^p$ experiments to estimate the various parameters is likely expensive and/or infeasible as $N$ and $p$ grow. Further, with observational data there is likely confounding, i.e., whether or not a unit is seen under a combination is correlated with its potential outcome under that combination. To address these challenges, we propose a novel latent factor model that imposes structure across units (i.e., the matrix of potential outcomes is approximately rank $r$), and combinations of interventions (i.e., the coefficients in the Fourier expansion of the potential outcomes is approximately $s$ sparse). We establish identification for all $N \times 2^p$ parameters despite unobserved confounding. We propose an estimation procedure, Synthetic Combinations, and establish it is finite-sample consistent and asymptotically normal under precise conditions on the observation pattern. Our results imply consistent estimation given $\text{poly}(r) \times \left( N + s^2p\right)$ observations, while previous methods have sample complexity scaling as $\min(N \times s^2p, \ \ \text{poly(r)} \times (N + 2^p))$. We use Synthetic Combinations to propose a data-efficient experimental design. Empirically, Synthetic Combinations outperforms competing approaches on a real-world dataset on movie recommendations. Lastly, we extend our analysis to do causal inference where the intervention is a permutation over $p$ items (e.g., rankings). △ Less

Submitted 15 January, 2024; v1 submitted 24 March, 2023; originally announced March 2023.

arXiv:2211.14236 [pdf, other]

Strategyproof Decision-Making in Panel Data Settings and Beyond

Authors: Keegan Harris, Anish Agarwal, Chara Podimata, Zhiwei Steven Wu

Abstract: We consider the problem of decision-making using panel data, in which a decision-maker gets noisy, repeated measurements of multiple units (or agents). We consider a setup where there is a pre-intervention period, when the principal observes the outcomes of each unit, after which the principal uses these observations to assign a treatment to each unit. Unlike this classical setting, we permit the… ▽ More We consider the problem of decision-making using panel data, in which a decision-maker gets noisy, repeated measurements of multiple units (or agents). We consider a setup where there is a pre-intervention period, when the principal observes the outcomes of each unit, after which the principal uses these observations to assign a treatment to each unit. Unlike this classical setting, we permit the units generating the panel data to be strategic, i.e. units may modify their pre-intervention outcomes in order to receive a more desirable intervention. The principal's goal is to design a strategyproof intervention policy, i.e. a policy that assigns units to their utility-maximizing interventions despite their potential strategizing. We first identify a necessary and sufficient condition under which a strategyproof intervention policy exists, and provide a strategyproof mechanism with a simple closed form when one does exist. Along the way, we prove impossibility results for strategic multiclass classification, which may be of independent interest. When there are two interventions, we establish that there always exists a strategyproof mechanism, and provide an algorithm for learning such a mechanism. For three or more interventions, we provide an algorithm for learning a strategyproof mechanism if there exists a sufficiently large gap in the principal's rewards between different interventions. Finally, we empirically evaluate our model using real-world panel data collected from product sales over 18 months. We find that our methods compare favorably to baselines which do not take strategic interactions into consideration, even in the presence of model misspecification. △ Less

Submitted 21 December, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

Comments: In the fiftieth ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS 2024)

arXiv:2210.11355 [pdf, other]

Network Synthetic Interventions: A Causal Framework for Panel Data Under Network Interference

Authors: Anish Agarwal, Sarah H. Cen, Devavrat Shah, Christina Lee Yu

Abstract: We propose a generalization of the synthetic controls and synthetic interventions methodology to incorporate network interference. We consider the estimation of unit-specific potential outcomes from panel data in the presence of spillover across units and unobserved confounding. Key to our approach is a novel latent factor model that takes into account network interference and generalizes the fact… ▽ More We propose a generalization of the synthetic controls and synthetic interventions methodology to incorporate network interference. We consider the estimation of unit-specific potential outcomes from panel data in the presence of spillover across units and unobserved confounding. Key to our approach is a novel latent factor model that takes into account network interference and generalizes the factor models typically used in panel data settings. We propose an estimator, Network Synthetic Interventions (NSI), and show that it consistently estimates the mean outcomes for a unit under an arbitrary set of counterfactual treatments for the network. We further establish that the estimator is asymptotically normal. We furnish two validity tests for whether the NSI estimator reliably generalizes to produce accurate counterfactual estimates. We provide a novel graph-based experiment design that guarantees the NSI estimator produces accurate counterfactual estimates, and also analyze the sample complexity of the proposed design. We conclude with simulations that corroborate our theoretical findings. △ Less

Submitted 11 October, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

Comments: 49 pages, 6 figures

arXiv:2210.11003 [pdf, other]

Synthetic Blip Effects: Generalizing Synthetic Controls for the Dynamic Treatment Regime

Authors: Anish Agarwal, Vasilis Syrgkanis

Abstract: We propose a generalization of the synthetic control and synthetic interventions methodology to the dynamic treatment regime. We consider the estimation of unit-specific treatment effects from panel data collected via a dynamic treatment regime and in the presence of unobserved confounding. That is, each unit receives multiple treatments sequentially, based on an adaptive policy, which depends on… ▽ More We propose a generalization of the synthetic control and synthetic interventions methodology to the dynamic treatment regime. We consider the estimation of unit-specific treatment effects from panel data collected via a dynamic treatment regime and in the presence of unobserved confounding. That is, each unit receives multiple treatments sequentially, based on an adaptive policy, which depends on a latent endogenously time-varying confounding state of the treated unit. Under a low-rank latent factor model assumption and a technical overlap assumption we propose an identification strategy for any unit-specific mean outcome under any sequence of interventions. The latent factor model we propose admits linear time-varying and time-invariant dynamical systems as special cases. Our approach can be seen as an identification strategy for structural nested mean models under a low-rank latent factor assumption on the blip effects. Our method, which we term "synthetic blip effects", is a backwards induction process, where the blip effect of a treatment at each period and for a target unit is recursively expressed as linear combinations of blip effects of a carefully chosen group of other units that received the designated treatment. Our work avoids the combinatorial explosion in the number of units that would be required by a vanilla application of prior synthetic control and synthetic intervention methods in such dynamic treatment regime settings. △ Less

Submitted 20 October, 2022; originally announced October 2022.

arXiv:2109.15154 [pdf, other]

Causal Matrix Completion

Authors: Anish Agarwal, Munther Dahleh, Devavrat Shah, Dennis Shen

Abstract: Matrix completion is the study of recovering an underlying matrix from a sparse subset of noisy observations. Traditionally, it is assumed that the entries of the matrix are "missing completely at random" (MCAR), i.e., each entry is revealed at random, independent of everything else, with uniform probability. This is likely unrealistic due to the presence of "latent confounders", i.e., unobserved… ▽ More Matrix completion is the study of recovering an underlying matrix from a sparse subset of noisy observations. Traditionally, it is assumed that the entries of the matrix are "missing completely at random" (MCAR), i.e., each entry is revealed at random, independent of everything else, with uniform probability. This is likely unrealistic due to the presence of "latent confounders", i.e., unobserved factors that determine both the entries of the underlying matrix and the missingness pattern in the observed matrix. For example, in the context of movie recommender systems -- a canonical application for matrix completion -- a user who vehemently dislikes horror films is unlikely to ever watch horror films. In general, these confounders yield "missing not at random" (MNAR) data, which can severely impact any inference procedure that does not correct for this bias. We develop a formal causal model for matrix completion through the language of potential outcomes, and provide novel identification arguments for a variety of causal estimands of interest. We design a procedure, which we call "synthetic nearest neighbors" (SNN), to estimate these causal estimands. We prove finite-sample consistency and asymptotic normality of our estimator. Our analysis also leads to new theoretical results for the matrix completion literature. In particular, we establish entry-wise, i.e., max-norm, finite-sample consistency and asymptotic normality results for matrix completion with MNAR data. As a special case, this also provides entry-wise bounds for matrix completion with MCAR data. Across simulated and real data, we demonstrate the efficacy of our proposed estimator. △ Less

Submitted 30 September, 2021; originally announced September 2021.

arXiv:2107.02780 [pdf, other]

Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy

Authors: Anish Agarwal, Rahul Singh

Abstract: The US Census Bureau will deliberately corrupt data sets derived from the 2020 US Census, enhancing the privacy of respondents while potentially reducing the precision of economic analysis. To investigate whether this trade-off is inevitable, we formulate a semiparametric model of causal inference with high dimensional corrupted data. We propose a procedure for data cleaning, estimation, and infer… ▽ More The US Census Bureau will deliberately corrupt data sets derived from the 2020 US Census, enhancing the privacy of respondents while potentially reducing the precision of economic analysis. To investigate whether this trade-off is inevitable, we formulate a semiparametric model of causal inference with high dimensional corrupted data. We propose a procedure for data cleaning, estimation, and inference with data cleaning-adjusted confidence intervals. We prove consistency and Gaussian approximation by finite sample arguments, with a rate of $n^{ 1/2}$ for semiparametric estimands that degrades gracefully for nonparametric estimands. Our key assumption is that the true covariates are approximately low rank, which we interpret as approximate repeated measurements and empirically validate. Our analysis provides nonasymptotic theoretical contributions to matrix completion, statistical learning, and semiparametric statistics. Calibrated simulations verify the coverage of our data cleaning adjusted confidence intervals and demonstrate the relevance of our results for Census-derived data. △ Less

Submitted 12 February, 2024; v1 submitted 6 July, 2021; originally announced July 2021.

ACM Class: G.3; J.4

arXiv:2006.07691 [pdf, other]

Synthetic Interventions

Authors: Anish Agarwal, Devavrat Shah, Dennis Shen

Abstract: The synthetic controls (SC) methodology is a prominent tool for policy evaluation in panel data applications. Researchers commonly justify the SC framework with a low-rank matrix factor model that assumes the potential outcomes are described by low-dimensional unit and time specific latent factors. In the recent work of [Abadie '20], one of the pioneering authors of the SC method posed the questio… ▽ More The synthetic controls (SC) methodology is a prominent tool for policy evaluation in panel data applications. Researchers commonly justify the SC framework with a low-rank matrix factor model that assumes the potential outcomes are described by low-dimensional unit and time specific latent factors. In the recent work of [Abadie '20], one of the pioneering authors of the SC method posed the question of how the SC framework can be extended to multiple treatments. This article offers one resolution to this open question that we call synthetic interventions (SI). Fundamental to the SI framework is a low-rank tensor factor model, which extends the matrix factor model by including a latent factorization over treatments. Under this model, we propose a generalization of the standard SC-based estimators. We prove the consistency for one instantiation of our approach and provide conditions under which it is asymptotically normal. Moreover, we conduct a representative simulation to study its prediction performance and revisit the canonical SC case study of [Abadie-Diamond-Hainmueller '10] on the impact of anti-tobacco legislations by exploring related questions not previously investigated. △ Less

Submitted 23 August, 2024; v1 submitted 13 June, 2020; originally announced June 2020.

arXiv:2005.00072 [pdf, other]

Two Burning Questions on COVID-19: Did shutting down the economy help? Can we (partially) reopen the economy without risking the second wave?

Authors: Anish Agarwal, Abdullah Alomar, Arnab Sarker, Devavrat Shah, Dennis Shen, Cindy Yang

Abstract: As we reach the apex of the COVID-19 pandemic, the most pressing question facing us is: can we even partially reopen the economy without risking a second wave? We first need to understand if shutting down the economy helped. And if it did, is it possible to achieve similar gains in the war against the pandemic while partially opening up the economy? To do so, it is critical to understand the effec… ▽ More As we reach the apex of the COVID-19 pandemic, the most pressing question facing us is: can we even partially reopen the economy without risking a second wave? We first need to understand if shutting down the economy helped. And if it did, is it possible to achieve similar gains in the war against the pandemic while partially opening up the economy? To do so, it is critical to understand the effects of the various interventions that can be put into place and their corresponding health and economic implications. Since many interventions exist, the key challenge facing policy makers is understanding the potential trade-offs between them, and choosing the particular set of interventions that works best for their circumstance. In this memo, we provide an overview of Synthetic Interventions (a natural generalization of Synthetic Control), a data-driven and statistically principled method to perform what-if scenario planning, i.e., for policy makers to understand the trade-offs between different interventions before having to actually enact them. In essence, the method leverages information from different interventions that have already been enacted across the world and fits it to a policy maker's setting of interest, e.g., to estimate the effect of mobility-restricting interventions on the U.S., we use daily death data from countries that enforced severe mobility restrictions to create a "synthetic low mobility U.S." and predict the counterfactual trajectory of the U.S. if it had indeed applied a similar intervention. Using Synthetic Interventions, we find that lifting severe mobility restrictions and only retaining moderate mobility restrictions (at retail and transit locations), seems to effectively flatten the curve. We hope this provides guidance on weighing the trade-offs between the safety of the population, strain on the healthcare system, and impact on the economy. △ Less

Submitted 10 May, 2020; v1 submitted 30 April, 2020; originally announced May 2020.

arXiv:1902.05622 [pdf, other]

The Shapley Taylor Interaction Index

Authors: Kedar Dhamdhere, Ashish Agarwal, Mukund Sundararajan

Abstract: The attribution problem, that is the problem of attributing a model's prediction to its base features, is well-studied. We extend the notion of attribution to also apply to feature interactions. The Shapley value is a commonly used method to attribute a model's prediction to its base features. We propose a generalization of the Shapley value called Shapley-Taylor index that attributes the model'… ▽ More The attribution problem, that is the problem of attributing a model's prediction to its base features, is well-studied. We extend the notion of attribution to also apply to feature interactions. The Shapley value is a commonly used method to attribute a model's prediction to its base features. We propose a generalization of the Shapley value called Shapley-Taylor index that attributes the model's prediction to interactions of subsets of features up to some size k. The method is analogous to how the truncated Taylor Series decomposes the function value at a certain point using its derivatives at a different point. In fact, we show that the Shapley Taylor index is equal to the Taylor Series of the multilinear extension of the set-theoretic behavior of the model. We axiomatize this method using the standard Shapley axioms -- linearity, dummy, symmetry and efficiency -- and an additional axiom that we call the interaction distribution axiom. This new axiom explicitly characterizes how interactions are distributed for a class of functions that model pure interaction. We contrast the Shapley-Taylor index against the previously proposed Shapley Interaction index (cf. [9]) from the cooperative game theory literature. We also apply the Shapley Taylor index to three models and identify interesting qualitative insights. △ Less

Submitted 7 February, 2020; v1 submitted 14 February, 2019; originally announced February 2019.

Showing 1–16 of 16 results for author: Agarwal, A