-
Valid Bootstraps for Network Embeddings with Applications to Network Visualisation
Authors:
Emerald Dilworth,
Ed Davis,
Daniel J. Lawson
Abstract:
Quantifying uncertainty in networks is an important step in modelling relationships and interactions between entities. We consider the challenge of bootstrapping an inhomogeneous random graph when only a single observation of the network is made and the underlying data generating function is unknown. We address this problem by considering embeddings of the observed and bootstrapped network that ar…
▽ More
Quantifying uncertainty in networks is an important step in modelling relationships and interactions between entities. We consider the challenge of bootstrapping an inhomogeneous random graph when only a single observation of the network is made and the underlying data generating function is unknown. We address this problem by considering embeddings of the observed and bootstrapped network that are statistically indistinguishable. We utilise an exchangeable network test that can empirically validate bootstrap samples generated by any method. Existing methods fail this test, so we propose a principled, distribution-free network bootstrap using k-nearest neighbour smoothing, that can pass this exchangeable network test in many synthetic and real-data scenarios. We demonstrate the utility of this work in combination with the popular data visualisation method t-SNE, where uncertainty estimates from bootstrapping are used to explain whether visible structures represent real statistically sound structures.
△ Less
Submitted 14 May, 2025; v1 submitted 28 October, 2024;
originally announced October 2024.
-
Valid Conformal Prediction for Dynamic GNNs
Authors:
Ed Davis,
Ian Gallagher,
Daniel John Lawson,
Patrick Rubin-Delanchy
Abstract:
Dynamic graphs provide a flexible data abstraction for modelling many sorts of real-world systems, such as transport, trade, and social networks. Graph neural networks (GNNs) are powerful tools allowing for different kinds of prediction and inference on these systems, but getting a handle on uncertainty, especially in dynamic settings, is a challenging problem. In this work we propose to use a dyn…
▽ More
Dynamic graphs provide a flexible data abstraction for modelling many sorts of real-world systems, such as transport, trade, and social networks. Graph neural networks (GNNs) are powerful tools allowing for different kinds of prediction and inference on these systems, but getting a handle on uncertainty, especially in dynamic settings, is a challenging problem. In this work we propose to use a dynamic graph representation known in the tensor literature as the unfolding, to achieve valid prediction sets via conformal prediction. This representation, a simple graph, can be input to any standard GNN and does not require any modification to existing GNN architectures or conformal prediction routines. One of our key contributions is a careful mathematical consideration of the different inference scenarios which can arise in a dynamic graph modelling context. For a range of practically relevant cases, we obtain valid prediction sets with almost no assumptions, even dispensing with exchangeability. In a more challenging scenario, which we call the semi-inductive regime, we achieve valid prediction under stronger assumptions, akin to stationarity. We provide real data examples demonstrating validity, showing improved accuracy over baselines, and sign-posting different failure modes which can occur when those assumptions are violated.
△ Less
Submitted 26 March, 2025; v1 submitted 29 May, 2024;
originally announced May 2024.
-
SB-ETAS: using simulation based inference for scalable, likelihood-free inference for the ETAS model of earthquake occurrences
Authors:
Samuel Stockman,
Daniel J. Lawson,
Maximilian J. Werner
Abstract:
Performing Bayesian inference for the Epidemic-Type Aftershock Sequence (ETAS) model of earthquakes typically requires MCMC sampling using the likelihood function or estimating the latent branching structure. These tasks have computational complexity $O(n^2)$ with the number of earthquakes and therefore do not scale well with new enhanced catalogs, which can now contain an order of $10^6$ events.…
▽ More
Performing Bayesian inference for the Epidemic-Type Aftershock Sequence (ETAS) model of earthquakes typically requires MCMC sampling using the likelihood function or estimating the latent branching structure. These tasks have computational complexity $O(n^2)$ with the number of earthquakes and therefore do not scale well with new enhanced catalogs, which can now contain an order of $10^6$ events. On the other hand, simulation from the ETAS model can be done more quickly $O(n \log n )$. We present SB-ETAS: simulation-based inference for the ETAS model. This is an approximate Bayesian method which uses Sequential Neural Posterior Estimation (SNPE), a machine learning based algorithm for learning posterior distributions from simulations. SB-ETAS can successfully approximate ETAS posterior distributions on shorter catalogues where it is computationally feasible to compare with MCMC sampling. Furthermore, the scaling of SB-ETAS makes it feasible to fit to very large earthquake catalogs, such as one for Southern California dating back to 1932. SB-ETAS can find Bayesian estimates of ETAS parameters for this catalog in less than 10 hours on a standard laptop, which would have taken over 2 weeks using MCMC. Looking beyond the standard ETAS model, this simulation based framework would allow earthquake modellers to define and infer parameters for much more complex models that have intractable likelihood functions.
△ Less
Submitted 28 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
A Simple and Powerful Framework for Stable Dynamic Network Embedding
Authors:
Ed Davis,
Ian Gallagher,
Daniel John Lawson,
Patrick Rubin-Delanchy
Abstract:
In this paper, we address the problem of dynamic network embedding, that is, representing the nodes of a dynamic network as evolving vectors within a low-dimensional space. While the field of static network embedding is wide and established, the field of dynamic network embedding is comparatively in its infancy. We propose that a wide class of established static network embedding methods can be us…
▽ More
In this paper, we address the problem of dynamic network embedding, that is, representing the nodes of a dynamic network as evolving vectors within a low-dimensional space. While the field of static network embedding is wide and established, the field of dynamic network embedding is comparatively in its infancy. We propose that a wide class of established static network embedding methods can be used to produce interpretable and powerful dynamic network embeddings when they are applied to the dilated unfolded adjacency matrix. We provide a theoretical guarantee that, regardless of embedding dimension, these unfolded methods will produce stable embeddings, meaning that nodes with identical latent behaviour will be exchangeable, regardless of their position in time or space. We additionally define a hypothesis testing framework which can be used to evaluate the quality of a dynamic network embedding by testing for planted structure in simulated networks. Using this, we demonstrate that, even in trivial cases, unstable methods are often either conservative or encode incorrect structure. In contrast, we demonstrate that our suite of stable unfolded methods are not only more interpretable but also more powerful in comparison to their unstable counterparts.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Forecasting the 2016-2017 Central Apennines Earthquake Sequence with a Neural Point Process
Authors:
Samuel Stockman,
Daniel J. Lawson,
Maximilian J. Werner
Abstract:
Point processes have been dominant in modeling the evolution of seismicity for decades, with the Epidemic Type Aftershock Sequence (ETAS) model being most popular. Recent advances in machine learning have constructed highly flexible point process models using neural networks to improve upon existing parametric models. We investigate whether these flexible point process models can be applied to sho…
▽ More
Point processes have been dominant in modeling the evolution of seismicity for decades, with the Epidemic Type Aftershock Sequence (ETAS) model being most popular. Recent advances in machine learning have constructed highly flexible point process models using neural networks to improve upon existing parametric models. We investigate whether these flexible point process models can be applied to short-term seismicity forecasting by extending an existing temporal neural model to the magnitude domain and we show how this model can forecast earthquakes above a target magnitude threshold. We first demonstrate that the neural model can fit synthetic ETAS data, however, requiring less computational time because it is not dependent on the full history of the sequence. By artificially emulating short-term aftershock incompleteness in the synthetic dataset, we find that the neural model outperforms ETAS. Using a new enhanced catalog from the 2016-2017 Central Apennines earthquake sequence, we investigate the predictive skill of ETAS and the neural model with respect to the lowest input magnitude. Constructing multiple forecasting experiments using the Visso, Norcia and Campotosto earthquakes to partition training and testing data, we target M3+ events. We find both models perform similarly at previously explored thresholds (e.g., above M3), but lowering the threshold to M1.2 reduces the performance of ETAS unlike the neural model. We argue that some of these gains are due to the neural model's ability to handle incomplete data. The robustness to missing data and speed to train the neural model present it as an encouraging competitor in earthquake forecasting.
△ Less
Submitted 2 October, 2023; v1 submitted 24 January, 2023;
originally announced January 2023.
-
CLARITY -- Comparing heterogeneous data using dissimiLARITY
Authors:
Daniel J. Lawson,
Vinesh Solanki,
Igor Yanovich,
Johannes Dellert,
Damian Ruck,
Phillip Endicott
Abstract:
Integrating datasets from different disciplines is hard because the data are often qualitatively different in meaning, scale, and reliability. When two datasets describe the same entities, many scientific questions can be phrased around whether the (dis)similarities between entities are conserved across such different data. Our method, CLARITY, quantifies consistency across datasets, identifies wh…
▽ More
Integrating datasets from different disciplines is hard because the data are often qualitatively different in meaning, scale, and reliability. When two datasets describe the same entities, many scientific questions can be phrased around whether the (dis)similarities between entities are conserved across such different data. Our method, CLARITY, quantifies consistency across datasets, identifies where inconsistencies arise, and aids in their interpretation. We illustrate this using three diverse comparisons: gene methylation vs expression, evolution of language sounds vs word use, and country-level economic metrics vs cultural beliefs. The non-parametric approach is robust to noise and differences in scaling, and makes only weak assumptions about how the data were generated. It operates by decomposing similarities into two components: a `structural' component analogous to a clustering, and an underlying `relationship' between those structures. This allows a `structural comparison' between two similarity matrices using their predictability from `structure'. Significance is assessed with the help of re-sampling appropriate for each dataset. The software, CLARITY, is available as an R package from https://github.com/danjlawson/CLARITY.
△ Less
Submitted 2 December, 2021; v1 submitted 29 May, 2020;
originally announced June 2020.
-
A general decision framework for structuring computation using Data Directional Scaling to process massive similarity matrices
Authors:
Daniel John Lawson,
Niall M Adams
Abstract:
As datasets grow it becomes infeasible to process them completely with a desired model. For giant datasets, we frame the order in which computation is performed as a decision problem. The order is designed so that partial computations are of value and early stopping yields useful results. Our approach comprises two related tools: a decision framework to choose the order to perform computations, an…
▽ More
As datasets grow it becomes infeasible to process them completely with a desired model. For giant datasets, we frame the order in which computation is performed as a decision problem. The order is designed so that partial computations are of value and early stopping yields useful results. Our approach comprises two related tools: a decision framework to choose the order to perform computations, and an emulation framework to enable estimation of the unevaluated computations. The approach is applied to the problem of computing similarity matrices, for which the cost of computation grows quadratically with the number of objects. Reasoning about similarities before they are observed introduces difficulties as there is no natural space and hence comparisons are difficult. We solve this by introducing a computationally convenient form of multidimensional scaling we call `data directional scaling'. High quality estimation is possible with massively reduced computation from the naive approach, and can be scaled to very large matrices. The approach is applied to the practical problem of assessing genetic similarity in population genetics. The use of statistical reasoning in decision making for large scale problems promises to be an important tool in applying statistical methodology to Big Data.
△ Less
Submitted 17 March, 2014;
originally announced March 2014.
-
Apparent strength conceals instability in a model for the collapse of historical states
Authors:
Daniel John Lawson,
Neeraj Oak
Abstract:
An explanation for the political processes leading to the sudden collapse of empires and states would be useful for understanding both historical and contemporary political events. We seek a general description of state collapse spanning eras and cultures, from small kingdoms to continental empires, drawing on a suitably diverse range of historical sources. Our aim is to provide an accessible verb…
▽ More
An explanation for the political processes leading to the sudden collapse of empires and states would be useful for understanding both historical and contemporary political events. We seek a general description of state collapse spanning eras and cultures, from small kingdoms to continental empires, drawing on a suitably diverse range of historical sources. Our aim is to provide an accessible verbal hypothesis that bridges the gap between mathematical and social methodology. We use game-theory to determine whether factions within a state will accept the political status quo, or wish to better their circumstances through costly rebellion. In lieu of precise data we verify our model using sensitivity analysis. We find that a small amount of dissatisfaction is typically harmless, but can trigger sudden collapse when there is a sufficient buildup of political inequality. Contrary to intuition, a state is predicted to be least stable when its leadership is at the height of its political power and thus most able to exert its influence through external warfare, lavish expense or autocratic decree.
△ Less
Submitted 10 July, 2013;
originally announced July 2013.