Search | arXiv e-print repository

arXiv:2107.06428 [pdf, other]

For high-dimensional hierarchical models, consider exchangeability of effects across covariates instead of across datasets

Authors: Brian L. Trippe, Hilary K. Finucane, Tamara Broderick

Abstract: Hierarchical Bayesian methods enable information sharing across multiple related regression problems. While standard practice is to model regression parameters (effects) as (1) exchangeable across datasets and (2) correlated to differing degrees across covariates, we show that this approach exhibits poor statistical performance when the number of covariates exceeds the number of datasets. For inst… ▽ More Hierarchical Bayesian methods enable information sharing across multiple related regression problems. While standard practice is to model regression parameters (effects) as (1) exchangeable across datasets and (2) correlated to differing degrees across covariates, we show that this approach exhibits poor statistical performance when the number of covariates exceeds the number of datasets. For instance, in statistical genetics, we might regress dozens of traits (defining datasets) for thousands of individuals (responses) on up to millions of genetic variants (covariates). When an analyst has more covariates than datasets, we argue that it is often more natural to instead model effects as (1) exchangeable across covariates and (2) correlated to differing degrees across datasets. To this end, we propose a hierarchical model expressing our alternative perspective. We devise an empirical Bayes estimator for learning the degree of correlation between datasets. We develop theory that demonstrates that our method outperforms the classic approach when the number of covariates dominates the number of datasets, and corroborate this result empirically on several high-dimensional multiple regression and classification problems. △ Less

Submitted 13 July, 2021; originally announced July 2021.

Comments: 10 pages plus supplementary material

arXiv:1505.02213 [pdf, other]

Measuring dependence powerfully and equitably

Authors: Yakir A. Reshef, David N. Reshef, Hilary K. Finucane, Pardis C. Sabeti, Michael M. Mitzenmacher

Abstract: Given a high-dimensional data set we often wish to find the strongest relationships within it. A common strategy is to evaluate a measure of dependence on every variable pair and retain the highest-scoring pairs for follow-up. This strategy works well if the statistic used is equitable [Reshef et al. 2015a], i.e., if, for some measure of noise, it assigns similar scores to equally noisy relationsh… ▽ More Given a high-dimensional data set we often wish to find the strongest relationships within it. A common strategy is to evaluate a measure of dependence on every variable pair and retain the highest-scoring pairs for follow-up. This strategy works well if the statistic used is equitable [Reshef et al. 2015a], i.e., if, for some measure of noise, it assigns similar scores to equally noisy relationships regardless of relationship type (e.g., linear, exponential, periodic). In this paper, we introduce and characterize a population measure of dependence called MIC*. We show three ways that MIC* can be viewed: as the population value of MIC, a highly equitable statistic from [Reshef et al. 2011], as a canonical "smoothing" of mutual information, and as the supremum of an infinite sequence defined in terms of optimal one-dimensional partitions of the marginals of the joint distribution. Based on this theory, we introduce an efficient approach for computing MIC* from the density of a pair of random variables, and we define a new consistent estimator MICe for MIC* that is efficiently computable. In contrast, there is no known polynomial-time algorithm for computing the original equitable statistic MIC. We show through simulations that MICe has better bias-variance properties than MIC. We then introduce and prove the consistency of a second statistic, TICe, that is a trivial side-product of the computation of MICe and whose goal is powerful independence testing rather than equitability. We show in simulations that MICe and TICe have good equitability and power against independence respectively. The analyses here complement a more in-depth empirical evaluation of several leading measures of dependence [Reshef et al. 2015b] that shows state-of-the-art performance for MICe and TICe. △ Less

Submitted 30 August, 2021; v1 submitted 8 May, 2015; originally announced May 2015.

Comments: YAR and DNR are co-first authors, PCS and MMM are co-last authors. This paper, together with arXiv:1505.02212, subsumes arXiv:1408.4908. v3 includes new analyses and exposition. v4 is identical to v3 except for this comment. An error was found in the argument showing the consistency of the MIC estimator; see arXiv:2107.03836 for discussion and a corrected argument

Journal ref: J.Mach.Learn.Res. 17 (2016), 1-63

arXiv:1207.3727 [pdf, ps, other]

Algebraically recurrent random walks on groups

Authors: Itai Benjamini, Hilary Finucane, Romain Tessera

Abstract: Initial steps are presented towards understanding which finitely generated groups are almost surely generated as semigroups by the path of a random walk on the group. Initial steps are presented towards understanding which finitely generated groups are almost surely generated as semigroups by the path of a random walk on the group. △ Less

Submitted 24 December, 2012; v1 submitted 16 July, 2012; originally announced July 2012.

Comments: 9 pages

arXiv:1203.5624 [pdf, ps, other]

On the scaling limit of finite vertex transitive graphs with large diameter

Authors: Itai Benjamini, Hilary Finucane, Romain Tessera

Abstract: Let $(X_n)$ be an unbounded sequence of finite, connected, vertex transitive graphs such that $ |X_n | = o(diam(X_n)^q)$ for some $q>0$. We show that up to taking a subsequence, and after rescaling by the diameter, the sequence $(X_n)$ converges in the Gromov Hausdorff distance to a torus of dimension $<q$, equipped with some invariant Finsler metric. The proof relies on a recent quantitative vers… ▽ More Let $(X_n)$ be an unbounded sequence of finite, connected, vertex transitive graphs such that $ |X_n | = o(diam(X_n)^q)$ for some $q>0$. We show that up to taking a subsequence, and after rescaling by the diameter, the sequence $(X_n)$ converges in the Gromov Hausdorff distance to a torus of dimension $<q$, equipped with some invariant Finsler metric. The proof relies on a recent quantitative version of Gromov's theorem on groups with polynomial growth obtained by Breuillard, Green and Tao. If $X_n$ is only roughly transitive and $|X_n| = o\bigl({diam(X_n)^δ}\bigr)$ for $δ> 1$ sufficiently small, we prove, this time by elementary means, that $(X_n)$ converges to a circle. △ Less

Submitted 26 August, 2014; v1 submitted 26 March, 2012; originally announced March 2012.

Comments: Final version, to appear in Combinatorica

arXiv:1201.4960 [pdf, other]

A recursive construction of t-wise uniform permutations

Authors: Hilary Finucane, Ron Peled, Yariv Yaari

Abstract: We present a recursive construction of a (2t + 1)-wise uniform set of permutations on 2n objects using a (2t + 1) - (2n, n, \cdot) combinatorial design, a t-wise uniform set of permutations on n objects and a (2t+1)-wise uniform set of permutations on n objects. Using the complete design in this procedure gives a t-wise uniform set of permutations on n objects whose size is at most t^2n, the first… ▽ More We present a recursive construction of a (2t + 1)-wise uniform set of permutations on 2n objects using a (2t + 1) - (2n, n, \cdot) combinatorial design, a t-wise uniform set of permutations on n objects and a (2t+1)-wise uniform set of permutations on n objects. Using the complete design in this procedure gives a t-wise uniform set of permutations on n objects whose size is at most t^2n, the first non-trivial construction of an infinite family of t-wise uniform sets for t \geq 4. If a non-trivial design with suitable parameters is found, it will imply a corresponding improvement in the construction. △ Less

Submitted 4 November, 2012; v1 submitted 24 January, 2012; originally announced January 2012.

Journal ref: Random Structures & Algorithms 46, no. 3 (2015): 531-540

arXiv:1111.0472 [pdf, ps, other]

doi 10.1142/S1793525313500088

Finite Voronoi decompositions of infinite vertex transitive graphs

Authors: Hilary Finucane

Abstract: In this paper, we consider the Voronoi decompositions of an arbitrary infinite vertex-transitive graph G. In particular, we are interested in the following question: what is the largest number of Voronoi cells that must be infinite, given sufficiently (but finitely) many Voronoi sites which are sufficiently far from each other? We call this number the survival number s(G). The survival number of… ▽ More In this paper, we consider the Voronoi decompositions of an arbitrary infinite vertex-transitive graph G. In particular, we are interested in the following question: what is the largest number of Voronoi cells that must be infinite, given sufficiently (but finitely) many Voronoi sites which are sufficiently far from each other? We call this number the survival number s(G). The survival number of a graph has an alternative characterization in terms of covering, which we use to show that s(G) is always at least two. The survival number is not a quasi-isometry invariant, but it remains open whether finiteness of the s(G) is. We show that all vertex transitive graphs with polynomial growth have a finite s(G); vertex transitive graphs with infinitely many ends have an infinite s(G); the lamplighter graph LL(Z), which has exponential growth, has a finite s(G); and the lamplighter graph LL(Z^2), which is Liouville, has an infinite s(G). △ Less

Submitted 2 November, 2011; originally announced November 2011.

arXiv:1105.5569 [pdf, other]

doi 10.1016/j.spa.2014.03.012

Scenery Reconstruction on Finite Abelian Groups

Authors: Hilary Finucane, Omer Tamuz, Yariv Yaari

Abstract: We consider the question of when a random walk on a finite abelian group with a given step distribution can be used to reconstruct a binary labeling of the elements of the group, up to a shift. Matzinger and Lember (2006) give a sufficient condition for reconstructibility on cycles. While, as we show, this condition is not in general necessary, our main result is that it is necessary when the leng… ▽ More We consider the question of when a random walk on a finite abelian group with a given step distribution can be used to reconstruct a binary labeling of the elements of the group, up to a shift. Matzinger and Lember (2006) give a sufficient condition for reconstructibility on cycles. While, as we show, this condition is not in general necessary, our main result is that it is necessary when the length of the cycle is prime and larger than 5, and the step distribution has only rational probabilities. We extend this result to other abelian groups. △ Less

Submitted 30 April, 2014; v1 submitted 27 May, 2011; originally announced May 2011.

Comments: 16 pages, 2 figures

Journal ref: Stochastic Processes and their Applications, 124(8):2754-2770, 2014

arXiv:1009.0909 [pdf, other]

Comparing Pedigree Graphs

Authors: Bonnie Kirkpatrick, Yakir Reshef, Hilary Finucane, Haitao Jiang, Binhai Zhu, Richard M. Karp

Abstract: Pedigree graphs, or family trees, are typically constructed by an expensive process of examining genealogical records to determine which pairs of individuals are parent and child. New methods to automate this process take as input genetic data from a set of extant individuals and reconstruct ancestral individuals. There is a great need to evaluate the quality of these methods by comparing the esti… ▽ More Pedigree graphs, or family trees, are typically constructed by an expensive process of examining genealogical records to determine which pairs of individuals are parent and child. New methods to automate this process take as input genetic data from a set of extant individuals and reconstruct ancestral individuals. There is a great need to evaluate the quality of these methods by comparing the estimated pedigree to the true pedigree. In this paper, we consider two main pedigree comparison problems. The first is the pedigree isomorphism problem, for which we present a linear-time algorithm for leaf-labeled pedigrees. The second is the pedigree edit distance problem, for which we present 1) several algorithms that are fast and exact in various special cases, and 2) a general, randomized heuristic algorithm. In the negative direction, we first prove that the pedigree isomorphism problem is as hard as the general graph isomorphism problem, and that the sub-pedigree isomorphism problem is NP-hard. We then show that the pedigree edit distance problem is APX-hard in general and NP-hard on leaf-labeled pedigrees. We use simulated pedigrees to compare our edit-distance algorithms to each other as well as to a branch-and-bound algorithm that always finds an optimal solution. △ Less

Submitted 18 October, 2011; v1 submitted 5 September, 2010; originally announced September 2010.

Showing 1–8 of 8 results for author: Finucane, H