Search | arXiv e-print repository

arXiv:2004.08159 [pdf, other]

Detection and Estimation of Local Signals

Abstract: We study the maximum score statistic to detect and estimate local signals in the form of change-points in the level, slope, or other property of a sequence of observations, and to segment the sequence when there appear to be multiple changes. We find that when observations are serially dependent, the change-points can lead to upwardly biased estimates of autocorrelations, resulting in a sometimes… ▽ More We study the maximum score statistic to detect and estimate local signals in the form of change-points in the level, slope, or other property of a sequence of observations, and to segment the sequence when there appear to be multiple changes. We find that when observations are serially dependent, the change-points can lead to upwardly biased estimates of autocorrelations, resulting in a sometimes serious loss of power. Examples involving temperature variations, the level of atmospheric greenhouse gases, suicide rates, incidence of COVID-19, and excess deaths during the pandemic illustrate the general theory. △ Less

Submitted 2 November, 2021; v1 submitted 17 April, 2020; originally announced April 2020.

arXiv:1411.1437 [pdf, ps, other]

doi 10.1214/15-AOS1312

Higher criticism: $p$-values and criticism

Authors: Jian Li, David Siegmund

Abstract: This paper compares the higher criticism statistic (Donoho and Jin [Ann. Statist. 32 (2004) 962-994]), a modification of the higher criticism statistic also suggested by Donoho and Jin, and two statistics of the Berk-Jones [Z. Wahrsch. Verw. Gebiete 47 (1979) 47-59] type. New approximations to the significance levels of the statistics are derived, and their accuracy is studied by simulations. By n… ▽ More This paper compares the higher criticism statistic (Donoho and Jin [Ann. Statist. 32 (2004) 962-994]), a modification of the higher criticism statistic also suggested by Donoho and Jin, and two statistics of the Berk-Jones [Z. Wahrsch. Verw. Gebiete 47 (1979) 47-59] type. New approximations to the significance levels of the statistics are derived, and their accuracy is studied by simulations. By numerical examples it is shown that over a broad range of sample sizes the Berk-Jones statistics have a better power function than the higher criticism statistics to detect sparse mixtures. The applications suggested by Meinshausen and Rice [Ann. Statist. 34 (2006) 373-393], to find lower confidence bounds for the number of false hypotheses, and by Jeng, Cai and Li [Biometrika 100 (2013) 157-172], to detect copy number variants, are also studied. △ Less

Submitted 3 June, 2015; v1 submitted 5 November, 2014; originally announced November 2014.

Comments: Published at http://dx.doi.org/10.1214/15-AOS1312 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1312

Journal ref: Annals of Statistics 2015, Vol. 43, No. 3, 1323-1350

arXiv:1406.3258 [pdf, other]

Scanning a Poisson Random Field for Local Signals

Authors: Nancy R. Zhang, Benjamin Yakir, Charlie L. Xia, David Siegmund

Abstract: The detection of local genomic signals using high-throughput DNA sequencing data can be cast as a problem of scanning a Poisson random field for local changes in the rate of the process. We propose a likelihood-based framework for for such scans, and derive formulas for false positive rate control and power calculations. The framework can also accommodate mixtures of Poisson processes to deal with… ▽ More The detection of local genomic signals using high-throughput DNA sequencing data can be cast as a problem of scanning a Poisson random field for local changes in the rate of the process. We propose a likelihood-based framework for for such scans, and derive formulas for false positive rate control and power calculations. The framework can also accommodate mixtures of Poisson processes to deal with over-dispersion. As a specific, detailed example, we consider the detection of insertions and deletions by paired-end DNA-sequencing. We propose several statistics for this problem, compare their power under current experimental designs, and illustrate their application on an Illumina Platinum Genomes data set. △ Less

Submitted 12 June, 2014; originally announced June 2014.

arXiv:1108.3177 [pdf, ps, other]

doi 10.1214/10-AOAS400

Detecting simultaneous variant intervals in aligned sequences

Authors: David Siegmund, Benjamin Yakir, Nancy R. Zhang

Abstract: Given a set of aligned sequences of independent noisy observations, we are concerned with detecting intervals where the mean values of the observations change simultaneously in a subset of the sequences. The intervals of changed means are typically short relative to the length of the sequences, the subset where the change occurs, the "carriers," can be relatively small, and the sizes of the change… ▽ More Given a set of aligned sequences of independent noisy observations, we are concerned with detecting intervals where the mean values of the observations change simultaneously in a subset of the sequences. The intervals of changed means are typically short relative to the length of the sequences, the subset where the change occurs, the "carriers," can be relatively small, and the sizes of the changes can vary from one sequence to another. This problem is motivated by the scientific problem of detecting inherited copy number variants in aligned DNA samples. We suggest a statistic based on the assumption that for any given interval of changed means there is a given fraction of samples that carry the change. We derive an analytic approximation for the false positive error probability of a scan, which is shown by simulations to be reasonably accurate. We show that the new method usually improves on methods that analyze a single sample at a time and on our earlier multi-sample method, which is most efficient when the carriers form a large fraction of the set of sequences. The proposed procedure is also shown to be robust with respect to the assumed fraction of carriers of the changes. △ Less

Submitted 16 August, 2011; originally announced August 2011.

Comments: Published in at http://dx.doi.org/10.1214/10-AOAS400 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS400

Journal ref: Annals of Applied Statistics 2011, Vol. 5, No. 2A, 645-668

Showing 1–4 of 4 results for author: Siegmund, D