-
Neural Spacetimes for DAG Representation Learning
Authors:
Haitz Sáez de Ocáriz Borde,
Anastasis Kratsios,
Marc T. Law,
Xiaowen Dong,
Michael Bronstein
Abstract:
We propose a class of trainable deep learning-based geometries called Neural Spacetimes (NSTs), which can universally represent nodes in weighted directed acyclic graphs (DAGs) as events in a spacetime manifold. While most works in the literature focus on undirected graph representation learning or causality embedding separately, our differentiable geometry can encode both graph edge weights in it…
▽ More
We propose a class of trainable deep learning-based geometries called Neural Spacetimes (NSTs), which can universally represent nodes in weighted directed acyclic graphs (DAGs) as events in a spacetime manifold. While most works in the literature focus on undirected graph representation learning or causality embedding separately, our differentiable geometry can encode both graph edge weights in its spatial dimensions and causality in the form of edge directionality in its temporal dimensions. We use a product manifold that combines a quasi-metric (for space) and a partial order (for time). NSTs are implemented as three neural networks trained in an end-to-end manner: an embedding network, which learns to optimize the location of nodes as events in the spacetime manifold, and two other networks that optimize the space and time geometries in parallel, which we call a neural (quasi-)metric and a neural partial order, respectively. The latter two networks leverage recent ideas at the intersection of fractal geometry and deep learning to shape the geometry of the representation space in a data-driven fashion, unlike other works in the literature that use fixed spacetime manifolds such as Minkowski space or De Sitter space to embed DAGs. Our main theoretical guarantee is a universal embedding theorem, showing that any $k$-point DAG can be embedded into an NST with $1+\mathcal{O}(\log(k))$ distortion while exactly preserving its causal structure. The total number of parameters defining the NST is sub-cubic in $k$ and linear in the width of the DAG. If the DAG has a planar Hasse diagram, this is improved to $\mathcal{O}(\log(k)) + 2)$ spatial and 2 temporal dimensions. We validate our framework computationally with synthetic weighted DAGs and real-world network embeddings; in both cases, the NSTs achieve lower embedding distortions than their counterparts using fixed spacetime geometries.
△ Less
Submitted 9 March, 2025; v1 submitted 25 August, 2024;
originally announced August 2024.
-
Approximation Rates and VC-Dimension Bounds for (P)ReLU MLP Mixture of Experts
Authors:
Anastasis Kratsios,
Haitz Sáez de Ocáriz Borde,
Takashi Furuya,
Marc T. Law
Abstract:
Mixture-of-Experts (MoEs) can scale up beyond traditional deep learning models by employing a routing strategy in which each input is processed by a single "expert" deep learning model. This strategy allows us to scale up the number of parameters defining the MoE while maintaining sparse activation, i.e., MoEs only load a small number of their total parameters into GPU VRAM for the forward pass de…
▽ More
Mixture-of-Experts (MoEs) can scale up beyond traditional deep learning models by employing a routing strategy in which each input is processed by a single "expert" deep learning model. This strategy allows us to scale up the number of parameters defining the MoE while maintaining sparse activation, i.e., MoEs only load a small number of their total parameters into GPU VRAM for the forward pass depending on the input. In this paper, we provide an approximation and learning-theoretic analysis of mixtures of expert MLPs with (P)ReLU activation functions. We first prove that for every error level $\varepsilon>0$ and every Lipschitz function $f:[0,1]^n\to \mathbb{R}$, one can construct a MoMLP model (a Mixture-of-Experts comprising of (P)ReLU MLPs) which uniformly approximates $f$ to $\varepsilon$ accuracy over $[0,1]^n$, while only requiring networks of $\mathcal{O}(\varepsilon^{-1})$ parameters to be loaded in memory. Additionally, we show that MoMLPs can generalize since the entire MoMLP model has a (finite) VC dimension of $\tilde{O}(L\max\{nL,JW\})$, if there are $L$ experts and each expert has a depth and width of $J$ and $W$, respectively.
△ Less
Submitted 25 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Graph Metanetworks for Processing Diverse Neural Architectures
Authors:
Derek Lim,
Haggai Maron,
Marc T. Law,
Jonathan Lorraine,
James Lucas
Abstract:
Neural networks efficiently encode learned information within their parameters. Consequently, many tasks can be unified by treating neural networks themselves as input data. When doing so, recent studies demonstrated the importance of accounting for the symmetries and geometry of parameter spaces. However, those works developed architectures tailored to specific networks such as MLPs and CNNs with…
▽ More
Neural networks efficiently encode learned information within their parameters. Consequently, many tasks can be unified by treating neural networks themselves as input data. When doing so, recent studies demonstrated the importance of accounting for the symmetries and geometry of parameter spaces. However, those works developed architectures tailored to specific networks such as MLPs and CNNs without normalization layers, and generalizing such architectures to other types of networks can be challenging. In this work, we overcome these challenges by building new metanetworks - neural networks that take weights from other neural networks as input. Put simply, we carefully build graphs representing the input neural networks and process the graphs using graph neural networks. Our approach, Graph Metanetworks (GMNs), generalizes to neural architectures where competing methods struggle, such as multi-head attention layers, normalization layers, convolutional layers, ResNet blocks, and group-equivariant linear layers. We prove that GMNs are expressive and equivariant to parameter permutation symmetries that leave the input neural network functions unchanged. We validate the effectiveness of our method on several metanetwork tasks over diverse neural network architectures.
△ Less
Submitted 29 December, 2023; v1 submitted 7 December, 2023;
originally announced December 2023.
-
Invariant Probabilistic Prediction
Authors:
Alexander Henzi,
Xinwei Shen,
Michael Law,
Peter Bühlmann
Abstract:
In recent years, there has been a growing interest in statistical methods that exhibit robust performance under distribution changes between training and test data. While most of the related research focuses on point predictions with the squared error loss, this article turns the focus towards probabilistic predictions, which aim to comprehensively quantify the uncertainty of an outcome variable g…
▽ More
In recent years, there has been a growing interest in statistical methods that exhibit robust performance under distribution changes between training and test data. While most of the related research focuses on point predictions with the squared error loss, this article turns the focus towards probabilistic predictions, which aim to comprehensively quantify the uncertainty of an outcome variable given covariates. Within a causality-inspired framework, we investigate the invariance and robustness of probabilistic predictions with respect to proper scoring rules. We show that arbitrary distribution shifts do not, in general, admit invariant and robust probabilistic predictions, in contrast to the setting of point prediction. We illustrate how to choose evaluation metrics and restrict the class of distribution shifts to allow for identifiability and invariance in the prototypical Gaussian heteroscedastic linear model. Motivated by these findings, we propose a method to yield invariant probabilistic predictions, called IPP, and study the consistency of the underlying parameters. Finally, we demonstrate the empirical performance of our proposed procedure on simulated as well as on single-cell data.
△ Less
Submitted 16 June, 2024; v1 submitted 18 September, 2023;
originally announced September 2023.
-
Longitudinal Position and Cancer Risk in the United States Revisited
Authors:
Jin Niu,
Charlotte Brown,
Michael Law,
Justin Colacino,
Ya'acov Ritov
Abstract:
Background: The debate over daylight saving time has surged, with interests in the effects of sunlight exposure on health. \commentnj{Prior studies simulated daylight saving time and standard time conditions by analyzing different locations within time zones and neighboring areas across time zone borders.
Methods: We analyzed cancer incidence rates from various longitudinal positions within time…
▽ More
Background: The debate over daylight saving time has surged, with interests in the effects of sunlight exposure on health. \commentnj{Prior studies simulated daylight saving time and standard time conditions by analyzing different locations within time zones and neighboring areas across time zone borders.
Methods: We analyzed cancer incidence rates from various longitudinal positions within time zones and at time zone borders in the contiguous United States. Using data from State Cancer Profiles (2016-2020), we analyzed total cancer of 19 types and specific rates for eight cancers, adjusted for age and includes all demographics. Log-linear regression is used to replicate a previous study, and spatial regression models are employed to explore discontinuities at borders.
Results: Cancer rate differences lack statistical significance within time zones and near borders for total cancer and most individual cancers. Exceptions included breast, prostate, and liver \& bile duct cancers, which exhibited significant relationships with relative position at the 95\% significance level. Breast and liver and bile duct cancers saw decreases, while prostate cancer incidence increased from west to east within time zones.
Conclusions: Relative position does not have a significant impact on cancer incidence, hence cancer development in general. Isolated exceptions may warrant further investigation as more data becomes available.
Impact: Our findings challenge prior research, revealing numerous inconsistencies. These disparities urge a reconsideration of the potential disparities in human health associated with daylight saving time and standard time. They offer insights contribute to the ongoing discussion surrounding the retention or abandonment of DST.
△ Less
Submitted 28 November, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
A Rank-Based Sequential Test of Independence
Authors:
Alexander Henzi,
Michael Law
Abstract:
We consider the problem of independence testing for two univariate random variables in a sequential setting. By leveraging recent developments on safe, anytime-valid inference, we propose a test with time-uniform type I error control and derive explicit bounds on the finite sample performance of the test. We demonstrate the empirical performance of the procedure in comparison to existing sequentia…
▽ More
We consider the problem of independence testing for two univariate random variables in a sequential setting. By leveraging recent developments on safe, anytime-valid inference, we propose a test with time-uniform type I error control and derive explicit bounds on the finite sample performance of the test. We demonstrate the empirical performance of the procedure in comparison to existing sequential and non-sequential independence tests. Furthermore, since the proposed test is distribution free under the null hypothesis, we empirically simulate the gap due to Ville's inequality, the supermartingale analogue of Markov's inequality, that is commonly applied to control type I error in anytime-valid inference, and apply this to construct a truncated sequential test.
△ Less
Submitted 25 January, 2024; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Optimal Patient Allocation in Multi-Arm Clinical Trials
Authors:
Martin Law
Abstract:
A multi-arm multi-stage trial is a multi-arm trial which includes interim analyses - analysing the data at certain specified points, generally discontinuing treatments which are concluded to not work and proceeding with the remainder.
It is possible that the advantages of multi-arm trials over single-arm trials may be enhanced further by considering the allocation ratio, R. For an R:1 allocation…
▽ More
A multi-arm multi-stage trial is a multi-arm trial which includes interim analyses - analysing the data at certain specified points, generally discontinuing treatments which are concluded to not work and proceeding with the remainder.
It is possible that the advantages of multi-arm trials over single-arm trials may be enhanced further by considering the allocation ratio, R. For an R:1 allocation ratio, Rn patients are allocated to the control arm and n patients allocated to each active treatment arm. In this study, the optimal allocation ratio will be defined as the allocation ratio which results in the smallest total sample size satisfying some required power and probability of type I error. This is an intuitive definition in the context of clinical trials, as a smaller trial will in general be more ethical and less expensive than a larger one satisfying the same error rates. The purpose of this paper is to investigate the optimal allocation ratio in the case of multiple active treatment arms.
The setup for a single stage trial with K active treatment arms is described in Section 2, along with a brief exposition of Dunnett's statement regarding the optimal allocation ratio in such circumstances. Equations for type I error and power are derived, and the methodology used to investigate how total sample size may be minimised using allocation ratio is described. A two-stage trial is then considered, using the same methodology. Figures and tables showing how total sample size changes with allocation ratio, for a range of type I error and power values, are given in Section 3. The possible ethical and financial benefits of changing allocation ratio, including a simple example, is also included in Section 3. The results, and what they could mean in practical terms, are discussed in Section 4.
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
Multi-outcome trials with a generalised number of efficacious outcomes
Authors:
Martin Law,
Michael J. Grayling,
Adrian P. Mander
Abstract:
Existing multi-outcome designs focus almost entirely on evaluating whether all outcomes show evidence of efficacy or whether at least one outcome shows evidence of efficacy. While a small number of authors have provided multi-outcome designs that evaluate when a general number of outcomes show promise, these designs have been single-stage in nature only. We therefore propose two designs, of group-…
▽ More
Existing multi-outcome designs focus almost entirely on evaluating whether all outcomes show evidence of efficacy or whether at least one outcome shows evidence of efficacy. While a small number of authors have provided multi-outcome designs that evaluate when a general number of outcomes show promise, these designs have been single-stage in nature only. We therefore propose two designs, of group-sequential and drop the loser form, that provide this design characteristic in a multi-stage setting. Previous such multi-outcome multi-stage designs have allowed only for a maximum of two outcomes; our designs thus also extend previous related proposals by permitting any number of outcomes.
△ Less
Submitted 18 December, 2020;
originally announced December 2020.
-
The Hot Hand and Its Effect on the NBA
Authors:
Brian McNair,
Eric Margolin,
Michael Law,
Ya'acov Ritov
Abstract:
This paper aims to revisit and expand upon previous work on the "hot hand" phenomenon in basketball, specifically in the NBA. Using larger, modern data sets, we test streakiness of shooting patterns and the presence of hot hand behavior in free throw shooting, while going further by examining league-wide hot hand trends and the changes in individual player behavior. Additionally, we perform simula…
▽ More
This paper aims to revisit and expand upon previous work on the "hot hand" phenomenon in basketball, specifically in the NBA. Using larger, modern data sets, we test streakiness of shooting patterns and the presence of hot hand behavior in free throw shooting, while going further by examining league-wide hot hand trends and the changes in individual player behavior. Additionally, we perform simulations in order to assess their power. While we find no evidence of the hot hand in game-play and only weak evidence in free throw trials, we find that some NBA players exhibit behavioral changes based on the outcome of their previous shot.
△ Less
Submitted 29 October, 2020;
originally announced October 2020.
-
Ultrahyperbolic Representation Learning
Authors:
Marc T. Law,
Jos Stam
Abstract:
In machine learning, data is usually represented in a (flat) Euclidean space where distances between points are along straight lines. Researchers have recently considered more exotic (non-Euclidean) Riemannian manifolds such as hyperbolic space which is well suited for tree-like data. In this paper, we propose a representation living on a pseudo-Riemannian manifold of constant nonzero curvature. I…
▽ More
In machine learning, data is usually represented in a (flat) Euclidean space where distances between points are along straight lines. Researchers have recently considered more exotic (non-Euclidean) Riemannian manifolds such as hyperbolic space which is well suited for tree-like data. In this paper, we propose a representation living on a pseudo-Riemannian manifold of constant nonzero curvature. It is a generalization of hyperbolic and spherical geometries where the nondegenerate metric tensor need not be positive definite. We provide the necessary learning tools in this geometry and extend gradient-based optimization techniques. More specifically, we provide closed-form expressions for distances via geodesics and define a descent direction to minimize some objective function. Our novel framework is applied to graph representations.
△ Less
Submitted 10 January, 2021; v1 submitted 30 June, 2020;
originally announced July 2020.
-
Induction of Subgoal Automata for Reinforcement Learning
Authors:
Daniel Furelos-Blanco,
Mark Law,
Alessandra Russo,
Krysia Broda,
Anders Jonsson
Abstract:
In this work we present ISA, a novel approach for learning and exploiting subgoals in reinforcement learning (RL). Our method relies on inducing an automaton whose transitions are subgoals expressed as propositional formulas over a set of observable events. A state-of-the-art inductive logic programming system is used to learn the automaton from observation traces perceived by the RL agent. The re…
▽ More
In this work we present ISA, a novel approach for learning and exploiting subgoals in reinforcement learning (RL). Our method relies on inducing an automaton whose transitions are subgoals expressed as propositional formulas over a set of observable events. A state-of-the-art inductive logic programming system is used to learn the automaton from observation traces perceived by the RL agent. The reinforcement learning and automaton learning processes are interleaved: a new refined automaton is learned whenever the RL agent generates a trace not recognized by the current automaton. We evaluate ISA in several gridworld problems and show that it performs similarly to a method for which automata are given in advance. We also show that the learned automata can be exploited to speed up convergence through reward shaping and transfer learning across multiple tasks. Finally, we analyze the running time and the number of traces that ISA needs to learn an automata, and the impact that the number of observable events has on the learner's performance.
△ Less
Submitted 29 November, 2019;
originally announced November 2019.
-
A Theoretical Analysis of the Number of Shots in Few-Shot Learning
Authors:
Tianshi Cao,
Marc Law,
Sanja Fidler
Abstract:
Few-shot classification is the task of predicting the category of an example from a set of few labeled examples. The number of labeled examples per category is called the number of shots (or shot number). Recent works tackle this task through meta-learning, where a meta-learner extracts information from observed tasks during meta-training to quickly adapt to new tasks during meta-testing. In this…
▽ More
Few-shot classification is the task of predicting the category of an example from a set of few labeled examples. The number of labeled examples per category is called the number of shots (or shot number). Recent works tackle this task through meta-learning, where a meta-learner extracts information from observed tasks during meta-training to quickly adapt to new tasks during meta-testing. In this formulation, the number of shots exploited during meta-training has an impact on the recognition performance at meta-test time. Generally, the shot number used in meta-training should match the one used in meta-testing to obtain the best performance. We introduce a theoretical analysis of the impact of the shot number on Prototypical Networks, a state-of-the-art few-shot classification method. From our analysis, we propose a simple method that is robust to the choice of shot number used during meta-training, which is a crucial hyperparameter. The performance of our model trained for an arbitrary meta-training shot number shows great performance for different values of meta-testing shot numbers. We experimentally demonstrate our approach on different few-shot classification benchmarks.
△ Less
Submitted 14 February, 2020; v1 submitted 25 September, 2019;
originally announced September 2019.
-
Optimal curtailed designs for single arm phase II clinical trials
Authors:
Martin Law,
Michael J. Grayling,
Adrian P. Mander
Abstract:
In single-arm phase II oncology trials, the most popular choice of design is Simon's two-stage design, which allows early stopping at one interim analysis. However, the expected trial sample size can be reduced further by allowing curtailment. Curtailment is stopping when the final go or no-go decision is certain, so-called non-stochastic curtailment, or very likely, known as stochastic curtailmen…
▽ More
In single-arm phase II oncology trials, the most popular choice of design is Simon's two-stage design, which allows early stopping at one interim analysis. However, the expected trial sample size can be reduced further by allowing curtailment. Curtailment is stopping when the final go or no-go decision is certain, so-called non-stochastic curtailment, or very likely, known as stochastic curtailment.
In the context of single-arm phase II oncology trials, stochastic curtailment has previously been restricted to stopping in the second stage and/or stopping for a no-go decision only. We introduce two designs that incorporate stochastic curtailment and allow stopping after every observation, for either a go or no-go decision. We obtain optimal stopping boundaries by searching over a range of potential conditional powers, beyond which the trial will stop for a go or no-go decision. This search is novel: firstly, the search is undertaken over a range of values unique to each possible design realisation. Secondly, these values are evaluated taking into account the possibility of early stopping. Finally, each design realisation's operating characteristics are obtained exactly.
The proposed designs are compared to existing designs in a real data example. They are also compared under three scenarios, both with respect to four single optimality criteria and using a loss function.
The proposed designs are superior in almost all cases. Optimising for the expected sample size under either the null or alternative hypothesis, the saving compared to the popular Simon's design ranges from 22% to 55%.
△ Less
Submitted 6 September, 2019;
originally announced September 2019.
-
Estimating the Random Effect in Big Data Mixed Models
Authors:
Michael Law,
Ya'acov Ritov
Abstract:
We consider three problems in high-dimensional Gaussian linear mixed models. Without any assumptions on the design for the fixed effects, we construct an asymptotic $F$-statistic for testing whether a collection of random effects is zero, derive an asymptotic confidence interval for a single random effect at the parametric rate $\sqrt{n}$, and propose an empirical Bayes estimator for a part of the…
▽ More
We consider three problems in high-dimensional Gaussian linear mixed models. Without any assumptions on the design for the fixed effects, we construct an asymptotic $F$-statistic for testing whether a collection of random effects is zero, derive an asymptotic confidence interval for a single random effect at the parametric rate $\sqrt{n}$, and propose an empirical Bayes estimator for a part of the mean vector in ANOVA type models that performs asymptotically as well as the oracle Bayes estimator. We support our results with numerical simulations and provide comparisons with oracle estimators. The procedures developed are applied to the Trends in International Mathematics and Sciences Study (TIMSS) data.
△ Less
Submitted 27 July, 2019;
originally announced July 2019.
-
Centroid-based deep metric learning for speaker recognition
Authors:
Jixuan Wang,
Kuan-Chieh Wang,
Marc Law,
Frank Rudzicz,
Michael Brudno
Abstract:
Speaker embedding models that utilize neural networks to map utterances to a space where distances reflect similarity between speakers have driven recent progress in the speaker recognition task. However, there is still a significant performance gap between recognizing speakers in the training set and unseen speakers. The latter case corresponds to the few-shot learning task, where a trained model…
▽ More
Speaker embedding models that utilize neural networks to map utterances to a space where distances reflect similarity between speakers have driven recent progress in the speaker recognition task. However, there is still a significant performance gap between recognizing speakers in the training set and unseen speakers. The latter case corresponds to the few-shot learning task, where a trained model is evaluated on unseen classes. Here, we optimize a speaker embedding model with prototypical network loss (PNL), a state-of-the-art approach for the few-shot image classification task. The resulting embedding model outperforms the state-of-the-art triplet loss based models in both speaker verification and identification tasks, for both seen and unseen speakers.
△ Less
Submitted 6 February, 2019;
originally announced February 2019.
-
A matrix-based method of moments for fitting multivariate network meta-analysis models with multiple outcomes and random inconsistency effects
Authors:
Dan Jackson,
Sylwia Bujkiewicz,
Martin Law,
Richard D Riley,
Ian White
Abstract:
Random-effects meta-analyses are very commonly used in medical statistics. Recent methodological developments include multivariate (multiple outcomes) and network (multiple treatments) meta-analysis. Here we provide a new model and corresponding estimation procedure for multivariate network meta-analysis, so that multiple outcomes and treatments can be included in a single analysis. Our new multiv…
▽ More
Random-effects meta-analyses are very commonly used in medical statistics. Recent methodological developments include multivariate (multiple outcomes) and network (multiple treatments) meta-analysis. Here we provide a new model and corresponding estimation procedure for multivariate network meta-analysis, so that multiple outcomes and treatments can be included in a single analysis. Our new multivariate model is a direct extension of a univariate model for network meta-analysis that has recently been proposed. We allow two types of unknown variance parameters in our model, which represent between-study heterogeneity and inconsistency. Inconsistency arises when different forms of direct and indirect evidence are not in agreement, even having taken between-study heterogeneity into account. However the consistency assumption is often assumed in practice and so we also explain how to fit a reduced model which makes this assumption. Our estimation method extends several other commonly used methods for meta-analysis, including the method proposed by DerSimonian and Laird (1986). We investigate the use of our proposed methods in the context of a real example.
△ Less
Submitted 25 May, 2017;
originally announced May 2017.