Search | arXiv e-print repository

doi 10.1002/env.2776

Sequential Spatially Balanced Sampling

Authors: Raphaël Jauslin, Bardia Panahbehagh, Yves Tillé

Abstract: Sequential sampling occurs when the entire population is not known in advance and data are obtained one at a time or in groups of units. This manuscript proposes a new algorithm to sequentially select a balanced sample. The algorithm respects equal and unequal inclusion probabilities. The method can also be used to select a spatially balanced sample if the population of interest contains spatial c… ▽ More Sequential sampling occurs when the entire population is not known in advance and data are obtained one at a time or in groups of units. This manuscript proposes a new algorithm to sequentially select a balanced sample. The algorithm respects equal and unequal inclusion probabilities. The method can also be used to select a spatially balanced sample if the population of interest contains spatial coordinates. A simulation study is proposed and the results show that the proposed method outperforms other methods. △ Less

Submitted 1 June, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

Journal ref: (2022), Environmetrics, Volume 33, Issue 8

arXiv:2111.09309 [pdf, ps, other]

Stream Sampling with Immediate Decision

Authors: Bardia Panahbehagh, Raphaël Jauslin, Yves Tillé

Abstract: The manuscript introduces a method to select a random sample from a stream by deciding on each sampling unit immediately after observing it. The process could be applied to unequal as well as equal probability sampling. The implementation is straightforward. Algorithm selects a unit in the sample based on a single condition. It is particularly effective to make direct decisions on stream data, des… ▽ More The manuscript introduces a method to select a random sample from a stream by deciding on each sampling unit immediately after observing it. The process could be applied to unequal as well as equal probability sampling. The implementation is straightforward. Algorithm selects a unit in the sample based on a single condition. It is particularly effective to make direct decisions on stream data, despite the data arriving in groups or the stream being linear. △ Less

Submitted 17 November, 2021; originally announced November 2021.

arXiv:2111.08433 [pdf, ps, other]

Sequential Unequal Probability Sampling For Stream Population

Authors: Bardia Panahbehagh, Raphaël Jauslin, Yves Tillé

Abstract: A new unequal probability sampling method is proposed. This method is sequential. The decision to select or not each unit is made based on the order in which the units appear. A variant of this method allows selecting a sample from a stream. At each step, the decision to take the units successively according to the order of appearance in the stream is made. This method involves using a sliding win… ▽ More A new unequal probability sampling method is proposed. This method is sequential. The decision to select or not each unit is made based on the order in which the units appear. A variant of this method allows selecting a sample from a stream. At each step, the decision to take the units successively according to the order of appearance in the stream is made. This method involves using a sliding window that is as small as possible. The method also allows the sample to be spread and even the level of spreading to be adjusted. △ Less

Submitted 16 November, 2021; originally announced November 2021.

arXiv:2105.08379 [pdf, ps, other]

doi 10.1016/j.jspi.2022.12.003

An Efficient Approach for Statistical Matching of Survey Data Trough Calibration, Optimal Transport and Balanced Sampling

Authors: Raphaël Jauslin, Yves Tillé

Abstract: Statistical matching aims to integrate two statistical sources. These sources can be two samples or a sample and the entire population. If two samples have been selected from the same population and information has been collected on different variables of interest, then it is interesting to match the two surveys to analyse, for example, contingency tables or covariances. In this paper, we propose… ▽ More Statistical matching aims to integrate two statistical sources. These sources can be two samples or a sample and the entire population. If two samples have been selected from the same population and information has been collected on different variables of interest, then it is interesting to match the two surveys to analyse, for example, contingency tables or covariances. In this paper, we propose an efficient method for matching two samples that may each contain a weighting scheme. The method matches the records of the two sources. Several variants are proposed in order to create a directly usable file integrating data from both information sources. △ Less

Submitted 10 March, 2022; v1 submitted 18 May, 2021; originally announced May 2021.

Comments: 17 pages, 3 figures, 3 tables

Journal ref: (2023), Journal of Statistical Planning and Inference, Volume 225, Pages 121-131

arXiv:2101.05568 [pdf, other]

doi 10.1007/s42081-021-00134-y

Enhanced Cube Implementation For Highly Stratified Population

Authors: Raphaël Jauslin, Esther Eustache, Yves Tillé

Abstract: A balanced sampling design should always be the adopted strategies if auxiliary information is available. Besides, integrating a stratified structure of the population in the sampling process can considerably reduce the variance of the estimators. We propose here a new method to handle the selection of a balanced sample in a highly stratified population. The method improves substantially the commo… ▽ More A balanced sampling design should always be the adopted strategies if auxiliary information is available. Besides, integrating a stratified structure of the population in the sampling process can considerably reduce the variance of the estimators. We propose here a new method to handle the selection of a balanced sample in a highly stratified population. The method improves substantially the commonly used sampling design and reduces the time-consuming problem that could arise if inclusion probabilities within strata do not sum to an integer. △ Less

Submitted 14 January, 2021; originally announced January 2021.

Journal ref: Japanese Journal of Statistics and Data Science, 4, 783-795 (2021)

arXiv:1910.13152 [pdf, other]

doi 10.1007/s13253-020-00407-1

Spatial Spread Sampling Using Weakly Associated Vectors

Authors: Raphaël Jauslin, Yves Tillé

Abstract: Geographical data are generally autocorrelated. In this case, it is preferable to select spread units. In this paper, we propose a new method for selecting well-spread samples from a finite spatial population with equal or unequal inclusion probabilities. The proposed method is based on the definition of a spatial structure by using a stratification matrix. Our method exactly satisfies given inclu… ▽ More Geographical data are generally autocorrelated. In this case, it is preferable to select spread units. In this paper, we propose a new method for selecting well-spread samples from a finite spatial population with equal or unequal inclusion probabilities. The proposed method is based on the definition of a spatial structure by using a stratification matrix. Our method exactly satisfies given inclusion probabilities and provides samples that are very well-spread. A set of simulations shows that our method outperforms other existing methods such as the Generalized Random Tessellation Stratified (GRTS) or the Local Pivotal Method (LPM). Analysis of the variance on a real dataset shows that our method is more accurate than these two. Furthermore, a variance estimator is proposed. △ Less

Submitted 28 July, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

Comments: To appear in JABES

Journal ref: Journal of Agricultural, Biological and Environmental Statistics 25 (2020) 431-451

arXiv:1710.04549 [pdf, ps, other]

Measuring the spatial balance of a sample: A new measure based on the Moran's I index

Authors: Yves Tillé, Maria Michela Dickson, Giuseppe Espa, Diego Giuliani

Abstract: Measuring the degree of spatial spreading of a sample can be of great interest when sampling from a spatial population. The commonly used spatial balance index by Grafström et al. (2012) is particularly effective in comparing the level of spatial spreading of different samples from the same population. However, its unbounded and uninterpretable scale of measurement does not allow to assess the lev… ▽ More Measuring the degree of spatial spreading of a sample can be of great interest when sampling from a spatial population. The commonly used spatial balance index by Grafström et al. (2012) is particularly effective in comparing the level of spatial spreading of different samples from the same population. However, its unbounded and uninterpretable scale of measurement does not allow to assess the level of spatial spreading in absolute terms and confines its use to only raw comparisons. In this paper, we introduce a new absolute measure of the spatial spreading of a sample using a normalized version of the Moran's $I$ index. The properties and behaviour of the proposed measure are analysed through two simulation experiments, one based on artificial populations and the other on a population of real business units located in the province of Siena (Italy). △ Less

Submitted 27 November, 2017; v1 submitted 12 October, 2017; originally announced October 2017.

arXiv:1701.02483 [pdf, other]

Sampling Designs on Finite Populations with Spreading Control Parameters

Authors: Yves Tillé, Lionel Qualité, Matthieu Wilhelm

Abstract: We present new sampling methods in finite population that allow to control the joint inclusion probabilities of units and especially the spreading of sampled units in the population. They are based on the use of renewal chains and multivariate discrete distributions to generate the difference of population ranks between two successive selected units. With a Bernoulli sampling design, these differe… ▽ More We present new sampling methods in finite population that allow to control the joint inclusion probabilities of units and especially the spreading of sampled units in the population. They are based on the use of renewal chains and multivariate discrete distributions to generate the difference of population ranks between two successive selected units. With a Bernoulli sampling design, these differences follow a geometric distribution, and with a simple random sampling design they follow a negative hypergeometric distribution. We propose to use other distributions and introduce a large class of sampling designs with and without fixed sample size. The choice of the rank-difference distribution allows us to control units joint inclusion probabilities with a relatively simple method and closed form formula. Joint inclusion probabilities of neighboring units can be chosen to be larger, or smaller, compared to those of Bernoulli or simple random sampling, thus allowing to more or less spread the sample on the population. This can be useful when neighboring units have similar characteristics or, on the contrary, are very different. A set of simulations illustrates the qualities of this method. △ Less

Submitted 11 April, 2017; v1 submitted 10 January, 2017; originally announced January 2017.

Comments: Accepted in Statistica Sinica, typos and minor modifications

arXiv:1612.04965 [pdf, other]

Probability Sampling Designs: Principles for Choice of Design and Balancing

Authors: Yves Tillé, Matthieu Wilhelm

Abstract: The aim of this paper is twofold. First, three theoretical principles are formalized: randomization, overrepresentation and restriction. We develop these principles and give a rationale for their use in choosing the sampling design in a systematic way. In the model-assisted framework, knowledge of the population is formalized by modelling the population and the sampling design is chosen accordingl… ▽ More The aim of this paper is twofold. First, three theoretical principles are formalized: randomization, overrepresentation and restriction. We develop these principles and give a rationale for their use in choosing the sampling design in a systematic way. In the model-assisted framework, knowledge of the population is formalized by modelling the population and the sampling design is chosen accordingly. We show how the principles of overrepresentation and of restriction naturally arise from the modelling of the population. The balanced sampling then appears as a consequence of the modelling. Second, a review of probability balanced sampling is presented through the model-assisted framework. For some basic models, balanced sampling can be shown to be an optimal sampling design. Emphasis is placed on new spatial sampling methods and their related models. An illustrative example shows the advantages of the different methods. Throughout the paper, various examples illustrate how the three principles can be applied in order to improve inference. △ Less

Submitted 15 December, 2016; originally announced December 2016.

Comments: Accepted paper, Statistical Science

arXiv:1607.04993 [pdf, other]

Quasi-Systematic Sampling From a Continuous Population

Authors: Matthieu Wilhelm, Yves Tillé, Lionel Qualité

Abstract: A specific family of point processes are introduced that allow to select samples for the purpose of estimating the mean or the integral of a function of a real variable. These processes, called quasi-systematic processes, depend on a tuning parameter $r>0$ that permits to control the likeliness of jointly selecting neighbor units in a same sample. When $r$ is large, units that are close tend to no… ▽ More A specific family of point processes are introduced that allow to select samples for the purpose of estimating the mean or the integral of a function of a real variable. These processes, called quasi-systematic processes, depend on a tuning parameter $r>0$ that permits to control the likeliness of jointly selecting neighbor units in a same sample. When $r$ is large, units that are close tend to not be selected together and samples are well spread. When $r$ tends to infinity, the sampling design is close to systematic sampling. For all $r > 0$, the first and second-order unit inclusion densities are positive, allowing for unbiased estimators of variance. Algorithms to generate these sampling processes for any positive real value of $r$ are presented. When $r$ is large, the estimator of variance is unstable. It follows that $r$ must be chosen by the practitioner as a trade-off between an accurate estimation of the target parameter and an accurate estimation of the variance of the parameter estimator. The method's advantages are illustrated with a set of simulations. △ Less

Submitted 18 July, 2016; originally announced July 2016.

arXiv:1501.07622 [pdf, ps, other]

doi 10.1080/02331888.2016.1230615

Balanced $k$-nearest neighbor imputation

Authors: Caren Hasler, Yves Tillé

Abstract: In order to overcome the problem of item nonresponse, random imputation methods are often used because they tend to preserve the distribution of the imputed variable. Among the random imputation methods, the random hot-deck has the interesting property of imputing observed values. A new random hot-deck imputation method is proposed. The key innovation of this method is that the selection of donors… ▽ More In order to overcome the problem of item nonresponse, random imputation methods are often used because they tend to preserve the distribution of the imputed variable. Among the random imputation methods, the random hot-deck has the interesting property of imputing observed values. A new random hot-deck imputation method is proposed. The key innovation of this method is that the selection of donors is viewed as a sampling problem and uses calibration and balanced sampling. This approach makes it possible to select donors such that if the auxiliary variables were imputed, their estimated totals would not change. As a consequence, very accurate and stable totals estimations can be obtained. Moreover, the method is based on a nonparametric procedure. Donors are selected in neighborhoods of recipients. In this way, the missing value of a recipient is replaced with an observed value of a similar unit. This new approach is very flexible and can greatly improve the quality of estimations. Also, this method is unbiased under very different models and is thus resistant to model misspecification. Finally, the new method makes it possible to introduce edit rules while imputing. △ Less

Submitted 29 January, 2015; originally announced January 2015.

Showing 1–11 of 11 results for author: Tillé, Y