Search | arXiv e-print repository

Maximum Likelihood for Gaussian Process Classification and Generalized Linear Mixed Models under Case-Control Sampling

Authors: Omer Weissbrod, Shachar Kaufman, David Golan, Saharon Rosset

Abstract: Modern data sets in various domains often include units that were sampled non-randomly from the population and have a latent correlation structure. Here we investigate a common form of this setting, where every unit is associated with a latent variable, all latent variables are correlated, and the probability of sampling a unit depends on its response. Such settings often arise in case-control stu… ▽ More Modern data sets in various domains often include units that were sampled non-randomly from the population and have a latent correlation structure. Here we investigate a common form of this setting, where every unit is associated with a latent variable, all latent variables are correlated, and the probability of sampling a unit depends on its response. Such settings often arise in case-control studies, where the sampled units are correlated due to spatial proximity, family relations, or other sources of relatedness. Maximum likelihood estimation in such settings is challenging from both a computational and statistical perspective, necessitating approximations that take the sampling scheme into account. We propose a family of approximate likelihood approaches which combine composite likelihood and expectation propagation. We demonstrate the efficacy of our solutions via extensive simulations. We utilize them to investigate the genetic architecture of several complex disorders collected in case-control genetic association studies, where hundreds of thousands of genetic variants are measured for every individual, and the underlying disease liabilities of individuals are correlated due to genetic similarity. Our work is the first to provide a tractable likelihood-based solution for case-control data with complex dependency structures. △ Less

Submitted 24 April, 2019; v1 submitted 11 January, 2018; originally announced January 2018.

Journal ref: JMLR (108):1-30, 2019

arXiv:1410.6758 [pdf, other]

Consistent distribution-free $K$-sample and independence tests for univariate random variables

Authors: Ruth Heller, Yair Heller, Shachar Kaufman, Barak Brill, Malka Gorfine

Abstract: A popular approach for testing if two univariate random variables are statistically independent consists of partitioning the sample space into bins, and evaluating a test statistic on the binned data. The partition size matters, and the optimal partition size is data dependent. While for detecting simple relationships coarse partitions may be best, for detecting complex relationships a great gain… ▽ More A popular approach for testing if two univariate random variables are statistically independent consists of partitioning the sample space into bins, and evaluating a test statistic on the binned data. The partition size matters, and the optimal partition size is data dependent. While for detecting simple relationships coarse partitions may be best, for detecting complex relationships a great gain in power can be achieved by considering finer partitions. We suggest novel consistent distribution-free tests that are based on summation or maximization aggregation of scores over all partitions of a fixed size. We show that our test statistics based on summation can serve as good estimators of the mutual information. Moreover, we suggest regularized tests that aggregate over all partition sizes, and prove those are consistent too. We provide polynomial-time algorithms, which are critical for computing the suggested test statistics efficiently. We show that the power of the regularized tests is excellent compared to existing tests, and almost as powerful as the tests based on the optimal (yet unknown in practice) partition size, in simulations as well as on a real data example. △ Less

Submitted 18 June, 2015; v1 submitted 24 October, 2014; originally announced October 2014.

Comments: arXiv admin note: substantial text overlap with arXiv:1308.1559

Journal ref: Journal of Machine Learning Research (JMLR) 2016, vol. 17, No. 29, 1-54

arXiv:1311.2791 [pdf, other]

When Does More Regularization Imply Fewer Degrees of Freedom? Sufficient Conditions and Counter Examples from Lasso and Ridge Regression

Authors: Shachar Kaufman, Saharon Rosset

Abstract: Regularization aims to improve prediction performance of a given statistical modeling approach by moving to a second approach which achieves worse training error but is expected to have fewer degrees of freedom, i.e., better agreement between training and prediction error. We show here, however, that this expected behavior does not hold in general. In fact, counter examples are given that show reg… ▽ More Regularization aims to improve prediction performance of a given statistical modeling approach by moving to a second approach which achieves worse training error but is expected to have fewer degrees of freedom, i.e., better agreement between training and prediction error. We show here, however, that this expected behavior does not hold in general. In fact, counter examples are given that show regularization can increase the degrees of freedom in simple situations, including lasso and ridge regression, which are the most common regularization approaches in use. In such situations, the regularization increases both training error and degrees of freedom, and is thus inherently without merit. On the other hand, two important regularization scenarios are described where the expected reduction in degrees of freedom is indeed guaranteed: (a) all symmetric linear smoothers, and (b) linear regression versus convex constrained linear regression (as in the constrained variant of ridge regression and lasso). △ Less

Submitted 12 November, 2013; originally announced November 2013.

Comments: Main text: 15 pages, 2 figures; Supplementary material is included at the end of the main text: 9 pages, 7 figures

arXiv:1308.1559

Consistent distribution-free tests of association between univariate random variables

Authors: Ruth Heller, Yair Heller, Shachar Kaufman, Malka Gorfine

Abstract: We consider the problem of testing whether pairs of univariate random variables are associated. Few tests of independence exist that are consistent against all dependent alternatives and are distribution free. We propose novel tests that are consistent, distribution free, and have excellent power properties. The tests have simple form, and are surprisingly computationally efficient thanks to accom… ▽ More We consider the problem of testing whether pairs of univariate random variables are associated. Few tests of independence exist that are consistent against all dependent alternatives and are distribution free. We propose novel tests that are consistent, distribution free, and have excellent power properties. The tests have simple form, and are surprisingly computationally efficient thanks to accompanying innovative algorithms we develop. Moreover, we show that one of the test statistics is a consistent estimator of the mutual information. We demonstrate the good power properties in simulations, and apply the tests to a microarray study where many pairs of genes are examined simultaneously for co-dependence. △ Less

Submitted 8 December, 2014; v1 submitted 7 August, 2013; originally announced August 2013.

Comments: The paper has been withdrawn, since we submitted a new manuscript arXiv:1410.6758 that includes this work but is far more general, thus it included also many new results, and therefore should be read instead of this work

Showing 1–4 of 4 results for author: Kaufman, S