-
On the Validity of Conformal Prediction for Network Data Under Non-Uniform Sampling
Authors:
Robert Lunde
Abstract:
We study the properties of conformal prediction for network data under various sampling mechanisms that commonly arise in practice but often result in a non-representative sample of nodes. We interpret these sampling mechanisms as selection rules applied to a superpopulation and study the validity of conformal prediction conditional on an appropriate selection event. We show that the sampled subar…
▽ More
We study the properties of conformal prediction for network data under various sampling mechanisms that commonly arise in practice but often result in a non-representative sample of nodes. We interpret these sampling mechanisms as selection rules applied to a superpopulation and study the validity of conformal prediction conditional on an appropriate selection event. We show that the sampled subarray is exchangeable conditional on the selection event if the selection rule satisfies a permutation invariance property and a joint exchangeability condition holds for the superpopulation. Our result implies the finite-sample validity of conformal prediction for certain selection events related to ego networks and snowball sampling. We also show that when data are sampled via a random walk on a graph, a variant of weighted conformal prediction yields asymptotically valid prediction sets for an independently selected node from the population.
△ Less
Submitted 13 July, 2023; v1 submitted 12 June, 2023;
originally announced June 2023.
-
Conformal Prediction for Network-Assisted Regression
Authors:
Robert Lunde,
Elizaveta Levina,
Ji Zhu
Abstract:
An important problem in network analysis is predicting a node attribute using both network covariates, such as graph embedding coordinates or local subgraph counts, and conventional node covariates, such as demographic characteristics. While standard regression methods that make use of both types of covariates may be used for prediction, statistical inference is complicated by the fact that the no…
▽ More
An important problem in network analysis is predicting a node attribute using both network covariates, such as graph embedding coordinates or local subgraph counts, and conventional node covariates, such as demographic characteristics. While standard regression methods that make use of both types of covariates may be used for prediction, statistical inference is complicated by the fact that the nodal summary statistics are often dependent in complex ways. We show that under a mild joint exchangeability assumption, a network analog of conformal prediction achieves finite sample validity for a wide range of network covariates. We also show that a form of asymptotic conditional validity is achievable. The methods are illustrated on both simulated networks and a citation network dataset.
△ Less
Submitted 22 February, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
Bootstrapping the error of Oja's algorithm
Authors:
Robert Lunde,
Purnamrita Sarkar,
Rachel Ward
Abstract:
We consider the problem of quantifying uncertainty for the estimation error of the leading eigenvector from Oja's algorithm for streaming principal component analysis, where the data are generated IID from some unknown distribution. By combining classical tools from the U-statistics literature with recent results on high-dimensional central limit theorems for quadratic forms of random vectors and…
▽ More
We consider the problem of quantifying uncertainty for the estimation error of the leading eigenvector from Oja's algorithm for streaming principal component analysis, where the data are generated IID from some unknown distribution. By combining classical tools from the U-statistics literature with recent results on high-dimensional central limit theorems for quadratic forms of random vectors and concentration of matrix products, we establish a weighted $χ^2$ approximation result for the $\sin^2$ error between the population eigenvector and the output of Oja's algorithm. Since estimating the covariance matrix associated with the approximating distribution requires knowledge of unknown model parameters, we propose a multiplier bootstrap algorithm that may be updated in an online manner. We establish conditions under which the bootstrap distribution is close to the corresponding sampling distribution with high probability, thereby establishing the bootstrap as a consistent inferential method in an appropriate asymptotic regime.
△ Less
Submitted 19 May, 2022; v1 submitted 28 June, 2021;
originally announced June 2021.
-
Trading off Accuracy for Speedup: Multiplier Bootstraps for Subgraph Counts
Authors:
Qiaohui Lin,
Robert Lunde,
Purnamrita Sarkar
Abstract:
We propose a new class of multiplier bootstraps for count functionals, ranging from a fast, approximate linear bootstrap tailored to sparse, massive graphs to a quadratic bootstrap procedure that offers refined accuracy for smaller, denser graphs. For the fast, approximate linear bootstrap, we show that $\sqrt{n}$-consistent inference of the count functional is attainable in certain computational…
▽ More
We propose a new class of multiplier bootstraps for count functionals, ranging from a fast, approximate linear bootstrap tailored to sparse, massive graphs to a quadratic bootstrap procedure that offers refined accuracy for smaller, denser graphs. For the fast, approximate linear bootstrap, we show that $\sqrt{n}$-consistent inference of the count functional is attainable in certain computational regimes that depend on the sparsity level of the graph. Furthermore, even in more challenging regimes, we prove that our bootstrap procedure offers valid coverage and vanishing confidence intervals. For the quadratic bootstrap, we establish an Edgeworth expansion and show that this procedure offers higher-order accuracy under appropriate sparsity conditions. We complement our theoretical results with a simulation study and real data analysis and verify that our procedure offers state-of-the-art performance for several functionals.
△ Less
Submitted 7 April, 2022; v1 submitted 13 September, 2020;
originally announced September 2020.
-
On the Theoretical Properties of the Network Jackknife
Authors:
Qiaohui Lin,
Robert Lunde,
Purnamrita Sarkar
Abstract:
We study the properties of a leave-node-out jackknife procedure for network data. Under the sparse graphon model, we prove an Efron-Stein-type inequality, showing that the network jackknife leads to conservative estimates of the variance (in expectation) for any network functional that is invariant to node permutation. For a general class of count functionals, we also establish consistency of the…
▽ More
We study the properties of a leave-node-out jackknife procedure for network data. Under the sparse graphon model, we prove an Efron-Stein-type inequality, showing that the network jackknife leads to conservative estimates of the variance (in expectation) for any network functional that is invariant to node permutation. For a general class of count functionals, we also establish consistency of the network jackknife. We complement our theoretical analysis with a range of simulated and real-data examples and show that the network jackknife offers competitive performance in cases where other resampling methods are known to be valid. In fact, for several network statistics, we see that the jackknife provides more accurate inferences compared to related methods such as subsampling.
△ Less
Submitted 21 April, 2020; v1 submitted 19 April, 2020;
originally announced April 2020.
-
Subsampling Sparse Graphons Under Minimal Assumptions
Authors:
Robert Lunde,
Purnamrita Sarkar
Abstract:
We establish a general theory for subsampling network data generated by the sparse graphon model. In contrast to previous work for networks, we demonstrate validity under minimal assumptions; the main requirement is weak convergence of the functional of interest. We study the properties of two procedures: vertex subsampling and $p$-subsampling. For the first, we prove validity under the mild condi…
▽ More
We establish a general theory for subsampling network data generated by the sparse graphon model. In contrast to previous work for networks, we demonstrate validity under minimal assumptions; the main requirement is weak convergence of the functional of interest. We study the properties of two procedures: vertex subsampling and $p$-subsampling. For the first, we prove validity under the mild condition that the number of subsampled vertices is $o(n)$. For the second, we establish validity under analogous conditions on the expected subsample size. For both procedures, we also establish conditions under which uniform validity holds. Furthermore, under appropriate sparsity conditions, we derive limiting distributions for the nonzero eigenvalues of the adjacency matrix of a low rank sparse graphon. Our weak convergence result immediately yields the validity of subsampling for the nonzero eigenvalues under suitable assumptions.
△ Less
Submitted 25 August, 2019; v1 submitted 29 July, 2019;
originally announced July 2019.