-
Detection and Estimation of Local Signals
Authors:
Xiao Fang,
David Siegmund
Abstract:
We study the maximum score statistic to detect and estimate local signals in the form of change-points in the level, slope, or other property of a sequence of observations, and to segment the sequence when there appear to be multiple changes. We find that when observations are serially dependent, the change-points can lead to upwardly biased estimates of autocorrelations, resulting in a sometimes…
▽ More
We study the maximum score statistic to detect and estimate local signals in the form of change-points in the level, slope, or other property of a sequence of observations, and to segment the sequence when there appear to be multiple changes. We find that when observations are serially dependent, the change-points can lead to upwardly biased estimates of autocorrelations, resulting in a sometimes serious loss of power. Examples involving temperature variations, the level of atmospheric greenhouse gases, suicide rates, incidence of COVID-19, and excess deaths during the pandemic illustrate the general theory.
△ Less
Submitted 2 November, 2021; v1 submitted 17 April, 2020;
originally announced April 2020.
-
Higher criticism: $p$-values and criticism
Authors:
Jian Li,
David Siegmund
Abstract:
This paper compares the higher criticism statistic (Donoho and Jin [Ann. Statist. 32 (2004) 962-994]), a modification of the higher criticism statistic also suggested by Donoho and Jin, and two statistics of the Berk-Jones [Z. Wahrsch. Verw. Gebiete 47 (1979) 47-59] type. New approximations to the significance levels of the statistics are derived, and their accuracy is studied by simulations. By n…
▽ More
This paper compares the higher criticism statistic (Donoho and Jin [Ann. Statist. 32 (2004) 962-994]), a modification of the higher criticism statistic also suggested by Donoho and Jin, and two statistics of the Berk-Jones [Z. Wahrsch. Verw. Gebiete 47 (1979) 47-59] type. New approximations to the significance levels of the statistics are derived, and their accuracy is studied by simulations. By numerical examples it is shown that over a broad range of sample sizes the Berk-Jones statistics have a better power function than the higher criticism statistics to detect sparse mixtures. The applications suggested by Meinshausen and Rice [Ann. Statist. 34 (2006) 373-393], to find lower confidence bounds for the number of false hypotheses, and by Jeng, Cai and Li [Biometrika 100 (2013) 157-172], to detect copy number variants, are also studied.
△ Less
Submitted 3 June, 2015; v1 submitted 5 November, 2014;
originally announced November 2014.
-
Scanning a Poisson Random Field for Local Signals
Authors:
Nancy R. Zhang,
Benjamin Yakir,
Charlie L. Xia,
David Siegmund
Abstract:
The detection of local genomic signals using high-throughput DNA sequencing data can be cast as a problem of scanning a Poisson random field for local changes in the rate of the process. We propose a likelihood-based framework for for such scans, and derive formulas for false positive rate control and power calculations. The framework can also accommodate mixtures of Poisson processes to deal with…
▽ More
The detection of local genomic signals using high-throughput DNA sequencing data can be cast as a problem of scanning a Poisson random field for local changes in the rate of the process. We propose a likelihood-based framework for for such scans, and derive formulas for false positive rate control and power calculations. The framework can also accommodate mixtures of Poisson processes to deal with over-dispersion. As a specific, detailed example, we consider the detection of insertions and deletions by paired-end DNA-sequencing. We propose several statistics for this problem, compare their power under current experimental designs, and illustrate their application on an Illumina Platinum Genomes data set.
△ Less
Submitted 12 June, 2014;
originally announced June 2014.
-
Detecting simultaneous variant intervals in aligned sequences
Authors:
David Siegmund,
Benjamin Yakir,
Nancy R. Zhang
Abstract:
Given a set of aligned sequences of independent noisy observations, we are concerned with detecting intervals where the mean values of the observations change simultaneously in a subset of the sequences. The intervals of changed means are typically short relative to the length of the sequences, the subset where the change occurs, the "carriers," can be relatively small, and the sizes of the change…
▽ More
Given a set of aligned sequences of independent noisy observations, we are concerned with detecting intervals where the mean values of the observations change simultaneously in a subset of the sequences. The intervals of changed means are typically short relative to the length of the sequences, the subset where the change occurs, the "carriers," can be relatively small, and the sizes of the changes can vary from one sequence to another. This problem is motivated by the scientific problem of detecting inherited copy number variants in aligned DNA samples. We suggest a statistic based on the assumption that for any given interval of changed means there is a given fraction of samples that carry the change. We derive an analytic approximation for the false positive error probability of a scan, which is shown by simulations to be reasonably accurate. We show that the new method usually improves on methods that analyze a single sample at a time and on our earlier multi-sample method, which is most efficient when the carriers form a large fraction of the set of sequences. The proposed procedure is also shown to be robust with respect to the assumed fraction of carriers of the changes.
△ Less
Submitted 16 August, 2011;
originally announced August 2011.