-
Bootstrapping multiple systems estimates to account for model selection
Authors:
Bernard W. Silverman,
Lax Chan,
Kyle Vincent
Abstract:
Multiple systems estimation using a Poisson loglinear model is a standard approach to quantifying hidden populations where data sources are based on lists of known cases. Information criteria are often used for selecting between the large number of possible models. Confidence intervals are often reported conditional on the model selected, providing an over-optimistic impression of estimation accur…
▽ More
Multiple systems estimation using a Poisson loglinear model is a standard approach to quantifying hidden populations where data sources are based on lists of known cases. Information criteria are often used for selecting between the large number of possible models. Confidence intervals are often reported conditional on the model selected, providing an over-optimistic impression of estimation accuracy. A bootstrap approach is a natural way to account for the model selection. However, because the model selection step has to be carried out for every bootstrap replication, there may be a high or even prohibitive computational burden. We explore the merit of modifying the model selection procedure in the bootstrap to look only among a subset of models, chosen on the basis of their information criterion score on the original data. This provides large computational gains with little apparent effect on inference. We also incorporate rigorous and economical ways of approaching issues of the existence of estimators when applying the method to sparse data tables.
△ Less
Submitted 18 October, 2023; v1 submitted 31 March, 2023;
originally announced March 2023.
-
The vote Package: Single Transferable Vote and Other Electoral Systems in R
Authors:
Adrian E. Raftery,
Hana Ševčíková,
Bernard W. Silverman
Abstract:
We describe the vote package in R, which implements the plurality (or first-past-the-post), two-round runoff, score, approval and single transferable vote (STV) electoral systems, as well as methods for selecting the Condorcet winner and loser. We emphasize the STV system, which we have found to work well in practice for multi-winner elections with small electorates, such as committee and council…
▽ More
We describe the vote package in R, which implements the plurality (or first-past-the-post), two-round runoff, score, approval and single transferable vote (STV) electoral systems, as well as methods for selecting the Condorcet winner and loser. We emphasize the STV system, which we have found to work well in practice for multi-winner elections with small electorates, such as committee and council elections, and the selection of multiple job candidates. For single-winner elections, the STV is also called instant runoff voting (IRV), ranked choice voting (RCV), or the alternative vote (AV) system. The package also implements the STV system with equal preferences, for the first time in a software package, to our knowledge. It also implements a new variant of STV, in which a minimum number of candidates from a specified group are required to be elected. We illustrate the package with several real examples.
△ Less
Submitted 10 February, 2021;
originally announced February 2021.
-
Model fitting in Multiple Systems Analysis for the quantification of Modern Slavery: Classical and Bayesian approaches
Authors:
Bernard W. Silverman
Abstract:
Multiple systems estimation is a key approach for quantifying hidden populations such as the number of victims of modern slavery. The UK Government published an estimate of 10,000 to 13,000 victims, constructed by the present author, as part of the strategy leading to the Modern Slavery Act 2015. This estimate was obtained by a stepwise multiple systems method based on six lists. Further investiga…
▽ More
Multiple systems estimation is a key approach for quantifying hidden populations such as the number of victims of modern slavery. The UK Government published an estimate of 10,000 to 13,000 victims, constructed by the present author, as part of the strategy leading to the Modern Slavery Act 2015. This estimate was obtained by a stepwise multiple systems method based on six lists. Further investigation shows that a small proportion of the possible models give rather different answers, and that other model fitting approaches may choose one of these. Three data sets collected in the field of modern slavery, together with a data set about the death toll in the Kosovo conflict, are used to investigate the stability and robustness of various multiple systems estimate methods. The crucial aspect is the way that interactions between lists are modelled, because these can substantially affect the results. Model selection and Bayesian approaches are considered in detail, in particular to assess their stability and robustness when applied to real modern slavery data. A new Markov Chain Monte Carlo Bayesian approach is developed; overall, this gives robust and stable results at least for the examples considered. The software and datasets are freely and publicly available to facilitate wider implementation and further research.
△ Less
Submitted 11 August, 2019; v1 submitted 16 February, 2019;
originally announced February 2019.
-
Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists
Authors:
Lax Chan,
Bernard W. Silverman,
Kyle Vincent
Abstract:
Multiple systems estimation strategies have recently been applied to quantify hard-to-reach populations, particularly when estimating the number of victims of human trafficking and modern slavery. In such contexts, it is not uncommon to see sparse or even no overlap between some of the lists on which the estimates are based. These create difficulties in model fitting and selection, and we develop…
▽ More
Multiple systems estimation strategies have recently been applied to quantify hard-to-reach populations, particularly when estimating the number of victims of human trafficking and modern slavery. In such contexts, it is not uncommon to see sparse or even no overlap between some of the lists on which the estimates are based. These create difficulties in model fitting and selection, and we develop inference procedures to address these challenges. The approach is based on Poisson log-linear regression modeling. Issues investigated in detail include taking proper account of data sparsity in the estimation procedure, as well as the existence and identifiability of maximum likelihood estimates. A stepwise method for choosing the most suitable parameters is developed, together with a bootstrap approach to finding confidence intervals for the total population size. We apply the strategy to two empirical data sets of trafficking in US regions, and find that the approach results in stable, reasonable estimates. An accompanying R software implementation has been made publicly available.
△ Less
Submitted 14 December, 2019; v1 submitted 13 February, 2019;
originally announced February 2019.
-
Julian Ernst Besag, 26 March 1945 -- 6 August 2010, a biographical memoir
Authors:
Peter J. Diggle,
Peter J. Green,
Bernard W. Silverman
Abstract:
Julian Besag was an outstanding statistical scientist, distinguished for his pioneering work on the statistical theory and analysis of spatial processes, especially conditional lattice systems. His work has been seminal in statistical developments over the last several decades ranging from image analysis to Markov chain Monte Carlo methods. He clarified the role of auto-logistic and auto-normal mo…
▽ More
Julian Besag was an outstanding statistical scientist, distinguished for his pioneering work on the statistical theory and analysis of spatial processes, especially conditional lattice systems. His work has been seminal in statistical developments over the last several decades ranging from image analysis to Markov chain Monte Carlo methods. He clarified the role of auto-logistic and auto-normal models as instances of Markov random fields and paved the way for their use in diverse applications. Later work included investigations into the efficacy of nearest neighbour models to accommodate spatial dependence in the analysis of data from agricultural field trials, image restoration from noisy data, and texture generation using lattice models.
△ Less
Submitted 2 January, 2018; v1 submitted 28 November, 2017;
originally announced November 2017.
-
Perfect simulation using dominated coupling from the past with application to area-interaction point processes and wavelet thresholding
Authors:
Graeme K. Ambler,
Bernard W. Silverman
Abstract:
We consider perfect simulation algorithms for locally stable point processes based on dominated coupling from the past, and apply these methods in two different contexts. A new version of the algorithm is developed which is feasible for processes which are neither purely attractive nor purely repulsive. Such processes include multiscale area-interaction processes, which are capable of modelling…
▽ More
We consider perfect simulation algorithms for locally stable point processes based on dominated coupling from the past, and apply these methods in two different contexts. A new version of the algorithm is developed which is feasible for processes which are neither purely attractive nor purely repulsive. Such processes include multiscale area-interaction processes, which are capable of modelling point patterns whose clustering structure varies across scales. The other topic considered is nonparametric regression using wavelets, where we use a suitable area-interaction process on the discrete space of indices of wavelet coefficients to model the notion that if one wavelet coefficient is non-zero then it is more likely that neighbouring coefficients will be also. A method based on perfect simulation within this model shows promising results compared to the standard methods which threshold coefficients independently.
△ Less
Submitted 28 February, 2010;
originally announced March 2010.
-
Comment: Bibliometrics in the Context of the UK Research Assessment Exercise
Authors:
Bernard W. Silverman
Abstract:
Research funding and reputation in the UK have, for over two decades, been increasingly dependent on a regular peer-review of all UK departments. This is to move to a system more based on bibliometrics. Assessment exercises of this kind influence the behavior of institutions, departments and individuals, and therefore bibliometrics will have effects beyond simple measurement. [arXiv:0910.3529]
Research funding and reputation in the UK have, for over two decades, been increasingly dependent on a regular peer-review of all UK departments. This is to move to a system more based on bibliometrics. Assessment exercises of this kind influence the behavior of institutions, departments and individuals, and therefore bibliometrics will have effects beyond simple measurement. [arXiv:0910.3529]
△ Less
Submitted 19 October, 2009;
originally announced October 2009.
-
Perfect simulation for Bayesian wavelet thresholding with correlated coefficients
Authors:
Graeme K. Ambler,
Bernard W. Silverman
Abstract:
We introduce a new method of Bayesian wavelet shrinkage for reconstructing a signal when we observe a noisy version. Rather than making the common assumption that the wavelet coefficients of the signal are independent, we allow for the possibility that they are locally correlated in both location (time) and scale (frequency). This leads us to a prior structure which is analytically intractable,…
▽ More
We introduce a new method of Bayesian wavelet shrinkage for reconstructing a signal when we observe a noisy version. Rather than making the common assumption that the wavelet coefficients of the signal are independent, we allow for the possibility that they are locally correlated in both location (time) and scale (frequency). This leads us to a prior structure which is analytically intractable, but it is possible to draw independent samples from a close approximation to the posterior distribution by an approach based on Coupling From The Past.
△ Less
Submitted 15 March, 2009;
originally announced March 2009.
-
Perfect simulation of spatial point processes using dominated coupling from the past with application to a multiscale area-interaction point process
Authors:
Graeme K. Ambler,
Bernard W. Silverman
Abstract:
We consider perfect simulation algorithms for locally stable point processes based on dominated coupling from the past. A version of the algorithm is developed which is feasible for processes which are neither purely attractive nor purely repulsive. Such processes include multiscale area-interaction processes, which are capable of modelling point patterns whose clustering structure varies across…
▽ More
We consider perfect simulation algorithms for locally stable point processes based on dominated coupling from the past. A version of the algorithm is developed which is feasible for processes which are neither purely attractive nor purely repulsive. Such processes include multiscale area-interaction processes, which are capable of modelling point patterns whose clustering structure varies across scales. We prove correctness of the algorithm and existence of these processes. An application to the redwood seedlings data is discussed.
△ Less
Submitted 15 March, 2009;
originally announced March 2009.