Equitability, interval estimation, and statistical power

Reshef, Yakir A.; Reshef, David N.; Sabeti, Pardis C.; Mitzenmacher, Michael M.

Mathematics > Statistics Theory

arXiv:1505.02212 (math)

[Submitted on 9 May 2015 (v1), last revised 12 May 2015 (this version, v2)]

Title:Equitability, interval estimation, and statistical power

Authors:Yakir A. Reshef, David N. Reshef, Pardis C. Sabeti, Michael M. Mitzenmacher

View PDF

Abstract:For analysis of a high-dimensional dataset, a common approach is to test a null hypothesis of statistical independence on all variable pairs using a non-parametric measure of dependence. However, because this approach attempts to identify any non-trivial relationship no matter how weak, it often identifies too many relationships to be useful. What is needed is a way of identifying a smaller set of relationships that merit detailed further analysis.
Here we formally present and characterize equitability, a property of measures of dependence that aims to overcome this challenge. Notionally, an equitable statistic is a statistic that, given some measure of noise, assigns similar scores to equally noisy relationships of different types [Reshef et al. 2011]. We begin by formalizing this idea via a new object called the interpretable interval, which functions as an interval estimate of the amount of noise in a relationship of unknown type. We define an equitable statistic as one with small interpretable intervals.
We then draw on the equivalence of interval estimation and hypothesis testing to show that under moderate assumptions an equitable statistic is one that yields well powered tests for distinguishing not only between trivial and non-trivial relationships of all kinds but also between non-trivial relationships of different strengths. This means that equitability allows us to specify a threshold relationship strength $x_0$ and to search for relationships of all kinds with strength greater than $x_0$. Thus, equitability can be thought of as a strengthening of power against independence that enables fruitful analysis of data sets with a small number of strong, interesting relationships and a large number of weaker ones. We conclude with a demonstration of how our two equivalent characterizations of equitability can be used to evaluate the equitability of a statistic in practice.

Comments:	Yakir A. Reshef and David N. Reshef are co-first authors, Pardis C. Sabeti and Michael M. Mitzenmacher are co-last authors. This paper, together with arXiv:1505.02212, subsumes arXiv:1408.4908
Subjects:	Statistics Theory (math.ST); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Methodology (stat.ME); Machine Learning (stat.ML)
Cite as:	arXiv:1505.02212 [math.ST]
	(or arXiv:1505.02212v2 [math.ST] for this version)
	https://doi.org/10.48550/arXiv.1505.02212

Submission history

From: Yakir Reshef [view email]
[v1] Sat, 9 May 2015 00:31:23 UTC (2,841 KB)
[v2] Tue, 12 May 2015 20:05:17 UTC (2,841 KB)

Mathematics > Statistics Theory

Title:Equitability, interval estimation, and statistical power

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Statistics Theory

Title:Equitability, interval estimation, and statistical power

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators