-
Sequential Spatially Balanced Sampling
Authors:
Raphaël Jauslin,
Bardia Panahbehagh,
Yves Tillé
Abstract:
Sequential sampling occurs when the entire population is not known in advance and data are obtained one at a time or in groups of units. This manuscript proposes a new algorithm to sequentially select a balanced sample. The algorithm respects equal and unequal inclusion probabilities. The method can also be used to select a spatially balanced sample if the population of interest contains spatial c…
▽ More
Sequential sampling occurs when the entire population is not known in advance and data are obtained one at a time or in groups of units. This manuscript proposes a new algorithm to sequentially select a balanced sample. The algorithm respects equal and unequal inclusion probabilities. The method can also be used to select a spatially balanced sample if the population of interest contains spatial coordinates. A simulation study is proposed and the results show that the proposed method outperforms other methods.
△ Less
Submitted 1 June, 2022; v1 submitted 2 December, 2021;
originally announced December 2021.
-
Stream Sampling with Immediate Decision
Authors:
Bardia Panahbehagh,
Raphaël Jauslin,
Yves Tillé
Abstract:
The manuscript introduces a method to select a random sample from a stream by deciding on each sampling unit immediately after observing it. The process could be applied to unequal as well as equal probability sampling. The implementation is straightforward. Algorithm selects a unit in the sample based on a single condition. It is particularly effective to make direct decisions on stream data, des…
▽ More
The manuscript introduces a method to select a random sample from a stream by deciding on each sampling unit immediately after observing it. The process could be applied to unequal as well as equal probability sampling. The implementation is straightforward. Algorithm selects a unit in the sample based on a single condition. It is particularly effective to make direct decisions on stream data, despite the data arriving in groups or the stream being linear.
△ Less
Submitted 17 November, 2021;
originally announced November 2021.
-
Sequential Unequal Probability Sampling For Stream Population
Authors:
Bardia Panahbehagh,
Raphaël Jauslin,
Yves Tillé
Abstract:
A new unequal probability sampling method is proposed. This method is sequential. The decision to select or not each unit is made based on the order in which the units appear. A variant of this method allows selecting a sample from a stream. At each step, the decision to take the units successively according to the order of appearance in the stream is made. This method involves using a sliding win…
▽ More
A new unequal probability sampling method is proposed. This method is sequential. The decision to select or not each unit is made based on the order in which the units appear. A variant of this method allows selecting a sample from a stream. At each step, the decision to take the units successively according to the order of appearance in the stream is made. This method involves using a sliding window that is as small as possible. The method also allows the sample to be spread and even the level of spreading to be adjusted.
△ Less
Submitted 16 November, 2021;
originally announced November 2021.
-
An Efficient Approach for Statistical Matching of Survey Data Trough Calibration, Optimal Transport and Balanced Sampling
Authors:
Raphaël Jauslin,
Yves Tillé
Abstract:
Statistical matching aims to integrate two statistical sources. These sources can be two samples or a sample and the entire population. If two samples have been selected from the same population and information has been collected on different variables of interest, then it is interesting to match the two surveys to analyse, for example, contingency tables or covariances. In this paper, we propose…
▽ More
Statistical matching aims to integrate two statistical sources. These sources can be two samples or a sample and the entire population. If two samples have been selected from the same population and information has been collected on different variables of interest, then it is interesting to match the two surveys to analyse, for example, contingency tables or covariances. In this paper, we propose an efficient method for matching two samples that may each contain a weighting scheme. The method matches the records of the two sources. Several variants are proposed in order to create a directly usable file integrating data from both information sources.
△ Less
Submitted 10 March, 2022; v1 submitted 18 May, 2021;
originally announced May 2021.
-
Enhanced Cube Implementation For Highly Stratified Population
Authors:
Raphaël Jauslin,
Esther Eustache,
Yves Tillé
Abstract:
A balanced sampling design should always be the adopted strategies if auxiliary information is available. Besides, integrating a stratified structure of the population in the sampling process can considerably reduce the variance of the estimators. We propose here a new method to handle the selection of a balanced sample in a highly stratified population. The method improves substantially the commo…
▽ More
A balanced sampling design should always be the adopted strategies if auxiliary information is available. Besides, integrating a stratified structure of the population in the sampling process can considerably reduce the variance of the estimators. We propose here a new method to handle the selection of a balanced sample in a highly stratified population. The method improves substantially the commonly used sampling design and reduces the time-consuming problem that could arise if inclusion probabilities within strata do not sum to an integer.
△ Less
Submitted 14 January, 2021;
originally announced January 2021.
-
Spatial Spread Sampling Using Weakly Associated Vectors
Authors:
Raphaël Jauslin,
Yves Tillé
Abstract:
Geographical data are generally autocorrelated. In this case, it is preferable to select spread units. In this paper, we propose a new method for selecting well-spread samples from a finite spatial population with equal or unequal inclusion probabilities. The proposed method is based on the definition of a spatial structure by using a stratification matrix. Our method exactly satisfies given inclu…
▽ More
Geographical data are generally autocorrelated. In this case, it is preferable to select spread units. In this paper, we propose a new method for selecting well-spread samples from a finite spatial population with equal or unequal inclusion probabilities. The proposed method is based on the definition of a spatial structure by using a stratification matrix. Our method exactly satisfies given inclusion probabilities and provides samples that are very well-spread. A set of simulations shows that our method outperforms other existing methods such as the Generalized Random Tessellation Stratified (GRTS) or the Local Pivotal Method (LPM). Analysis of the variance on a real dataset shows that our method is more accurate than these two. Furthermore, a variance estimator is proposed.
△ Less
Submitted 28 July, 2020; v1 submitted 29 October, 2019;
originally announced October 2019.
-
Measuring the spatial balance of a sample: A new measure based on the Moran's I index
Authors:
Yves Tillé,
Maria Michela Dickson,
Giuseppe Espa,
Diego Giuliani
Abstract:
Measuring the degree of spatial spreading of a sample can be of great interest when sampling from a spatial population. The commonly used spatial balance index by Grafström et al. (2012) is particularly effective in comparing the level of spatial spreading of different samples from the same population. However, its unbounded and uninterpretable scale of measurement does not allow to assess the lev…
▽ More
Measuring the degree of spatial spreading of a sample can be of great interest when sampling from a spatial population. The commonly used spatial balance index by Grafström et al. (2012) is particularly effective in comparing the level of spatial spreading of different samples from the same population. However, its unbounded and uninterpretable scale of measurement does not allow to assess the level of spatial spreading in absolute terms and confines its use to only raw comparisons. In this paper, we introduce a new absolute measure of the spatial spreading of a sample using a normalized version of the Moran's $I$ index. The properties and behaviour of the proposed measure are analysed through two simulation experiments, one based on artificial populations and the other on a population of real business units located in the province of Siena (Italy).
△ Less
Submitted 27 November, 2017; v1 submitted 12 October, 2017;
originally announced October 2017.
-
Sampling Designs on Finite Populations with Spreading Control Parameters
Authors:
Yves Tillé,
Lionel Qualité,
Matthieu Wilhelm
Abstract:
We present new sampling methods in finite population that allow to control the joint inclusion probabilities of units and especially the spreading of sampled units in the population. They are based on the use of renewal chains and multivariate discrete distributions to generate the difference of population ranks between two successive selected units. With a Bernoulli sampling design, these differe…
▽ More
We present new sampling methods in finite population that allow to control the joint inclusion probabilities of units and especially the spreading of sampled units in the population. They are based on the use of renewal chains and multivariate discrete distributions to generate the difference of population ranks between two successive selected units. With a Bernoulli sampling design, these differences follow a geometric distribution, and with a simple random sampling design they follow a negative hypergeometric distribution. We propose to use other distributions and introduce a large class of sampling designs with and without fixed sample size. The choice of the rank-difference distribution allows us to control units joint inclusion probabilities with a relatively simple method and closed form formula. Joint inclusion probabilities of neighboring units can be chosen to be larger, or smaller, compared to those of Bernoulli or simple random sampling, thus allowing to more or less spread the sample on the population. This can be useful when neighboring units have similar characteristics or, on the contrary, are very different. A set of simulations illustrates the qualities of this method.
△ Less
Submitted 11 April, 2017; v1 submitted 10 January, 2017;
originally announced January 2017.
-
Probability Sampling Designs: Principles for Choice of Design and Balancing
Authors:
Yves Tillé,
Matthieu Wilhelm
Abstract:
The aim of this paper is twofold. First, three theoretical principles are formalized: randomization, overrepresentation and restriction. We develop these principles and give a rationale for their use in choosing the sampling design in a systematic way. In the model-assisted framework, knowledge of the population is formalized by modelling the population and the sampling design is chosen accordingl…
▽ More
The aim of this paper is twofold. First, three theoretical principles are formalized: randomization, overrepresentation and restriction. We develop these principles and give a rationale for their use in choosing the sampling design in a systematic way. In the model-assisted framework, knowledge of the population is formalized by modelling the population and the sampling design is chosen accordingly. We show how the principles of overrepresentation and of restriction naturally arise from the modelling of the population. The balanced sampling then appears as a consequence of the modelling. Second, a review of probability balanced sampling is presented through the model-assisted framework. For some basic models, balanced sampling can be shown to be an optimal sampling design. Emphasis is placed on new spatial sampling methods and their related models. An illustrative example shows the advantages of the different methods. Throughout the paper, various examples illustrate how the three principles can be applied in order to improve inference.
△ Less
Submitted 15 December, 2016;
originally announced December 2016.
-
Quasi-Systematic Sampling From a Continuous Population
Authors:
Matthieu Wilhelm,
Yves Tillé,
Lionel Qualité
Abstract:
A specific family of point processes are introduced that allow to select samples for the purpose of estimating the mean or the integral of a function of a real variable. These processes, called quasi-systematic processes, depend on a tuning parameter $r>0$ that permits to control the likeliness of jointly selecting neighbor units in a same sample. When $r$ is large, units that are close tend to no…
▽ More
A specific family of point processes are introduced that allow to select samples for the purpose of estimating the mean or the integral of a function of a real variable. These processes, called quasi-systematic processes, depend on a tuning parameter $r>0$ that permits to control the likeliness of jointly selecting neighbor units in a same sample. When $r$ is large, units that are close tend to not be selected together and samples are well spread. When $r$ tends to infinity, the sampling design is close to systematic sampling. For all $r > 0$, the first and second-order unit inclusion densities are positive, allowing for unbiased estimators of variance.
Algorithms to generate these sampling processes for any positive real value of $r$ are presented. When $r$ is large, the estimator of variance is unstable. It follows that $r$ must be chosen by the practitioner as a trade-off between an accurate estimation of the target parameter and an accurate estimation of the variance of the parameter estimator. The method's advantages are illustrated with a set of simulations.
△ Less
Submitted 18 July, 2016;
originally announced July 2016.
-
Balanced $k$-nearest neighbor imputation
Authors:
Caren Hasler,
Yves Tillé
Abstract:
In order to overcome the problem of item nonresponse, random imputation methods are often used because they tend to preserve the distribution of the imputed variable. Among the random imputation methods, the random hot-deck has the interesting property of imputing observed values. A new random hot-deck imputation method is proposed. The key innovation of this method is that the selection of donors…
▽ More
In order to overcome the problem of item nonresponse, random imputation methods are often used because they tend to preserve the distribution of the imputed variable. Among the random imputation methods, the random hot-deck has the interesting property of imputing observed values. A new random hot-deck imputation method is proposed. The key innovation of this method is that the selection of donors is viewed as a sampling problem and uses calibration and balanced sampling. This approach makes it possible to select donors such that if the auxiliary variables were imputed, their estimated totals would not change. As a consequence, very accurate and stable totals estimations can be obtained. Moreover, the method is based on a nonparametric procedure. Donors are selected in neighborhoods of recipients. In this way, the missing value of a recipient is replaced with an observed value of a similar unit. This new approach is very flexible and can greatly improve the quality of estimations. Also, this method is unbiased under very different models and is thus resistant to model misspecification. Finally, the new method makes it possible to introduce edit rules while imputing.
△ Less
Submitted 29 January, 2015;
originally announced January 2015.