A Tale of Two Datasets: Representativeness and Generalisability of Inference for Samples of Networks

Krivitsky, Pavel N.; Coletti, Pietro; Hens, Niel

doi:10.1080/01621459.2023.2242627

Statistics > Methodology

arXiv:2202.03685 (stat)

[Submitted on 8 Feb 2022 (v1), last revised 18 Jul 2023 (this version, v4)]

Title:A Tale of Two Datasets: Representativeness and Generalisability of Inference for Samples of Networks

Authors:Pavel N. Krivitsky (1), Pietro Coletti (2), Niel Hens (2 and 3) ((1) Department of Statistics and UNSW Data Science Hub, School of Mathematics and Statistics, University of New South Wales, Sydney, Australia, (2) I-BioStat, Data Science Institute, Hasselt University, Hasselt, Belgium, (3) Centre for Health Economics and Modelling Infectious Diseases, Vaccine and Infectious Disease Institute, University of Antwerp, Antwerp, Belgium)

View PDF

Abstract:The last two decades have seen considerable progress in foundational aspects of statistical network analysis, but the path from theory to application is not straightforward. Two large, heterogeneous samples of small networks of within-household contacts in Belgium were collected using two different but complementary sampling designs: one smaller but with all contacts in each household observed, the other larger and more representative but recording contacts of only one person per household. We wish to combine their strengths to learn the social forces that shape household contact formation and facilitate simulation for prediction of disease spread, while generalising to the population of households in the region.
To accomplish this, we describe a flexible framework for specifying multi-network models in the exponential family class and identify the requirements for inference and prediction under this framework to be consistent, identifiable, and generalisable, even when data are incomplete; explore how these requirements may be violated in practice; and develop a suite of quantitative and graphical diagnostics for detecting violations and suggesting improvements to candidate models. We report on the effects of network size, geography, and household roles on household contact patterns (activity, heterogeneity in activity, and triadic closure).

Comments:	101 pages (3 front matter, 26 body, 72 appendix), 35 figures (4 body, 31 appendix), 66 tables (1 body, 61 appendix)
Subjects:	Methodology (stat.ME); Applications (stat.AP)
Cite as:	arXiv:2202.03685 [stat.ME]
	(or arXiv:2202.03685v4 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.2202.03685
Journal reference:	Journal of the American Statistical Association, 118(544), 2213-2224 (2023)
Related DOI:	https://doi.org/10.1080/01621459.2023.2242627

Submission history

From: Pavel Krivitsky [view email]
[v1] Tue, 8 Feb 2022 07:10:26 UTC (8,045 KB)
[v2] Thu, 17 Feb 2022 20:46:16 UTC (5,535 KB)
[v3] Fri, 24 Feb 2023 02:39:53 UTC (10,600 KB)
[v4] Tue, 18 Jul 2023 04:04:59 UTC (10,594 KB)

Statistics > Methodology

Title:A Tale of Two Datasets: Representativeness and Generalisability of Inference for Samples of Networks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Methodology

Title:A Tale of Two Datasets: Representativeness and Generalisability of Inference for Samples of Networks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators