-
Uniqueness of asymptotically cylindrical steady gradient Ricci solitons
Authors:
Michael B. Law
Abstract:
We show that the Bryant soliton is the unique asymptotically cylindrical steady gradient Ricci soliton, in any dimension $n \geq 3$ and without any curvature assumptions. This generalizes a celebrated theorem of Brendle. We also prove that any steady gradient Ricci soliton asymptotic to a cylinder over the homogeneous lens space $\mathbb{S}^{2m+1}/\mathbb{Z}_k = L_{m,k}$, for $m \geq 1$ and…
▽ More
We show that the Bryant soliton is the unique asymptotically cylindrical steady gradient Ricci soliton, in any dimension $n \geq 3$ and without any curvature assumptions. This generalizes a celebrated theorem of Brendle. We also prove that any steady gradient Ricci soliton asymptotic to a cylinder over the homogeneous lens space $\mathbb{S}^{2m+1}/\mathbb{Z}_k = L_{m,k}$, for $m \geq 1$ and $k \geq 3$, is a noncollapsed Appleton soliton on the complex line bundle $O(-k)$ over $\mathbb{CP}^m$. Specializing to dimension 4, we classify steady gradient Ricci soliton singularity models on smooth orbifolds with tangent flows at infinity of the form $(SU(2)/Γ) \times \mathbb{R}$.
△ Less
Submitted 3 July, 2025; v1 submitted 26 May, 2025;
originally announced May 2025.
-
Drift-harmonic functions with polynomial growth on asymptotically paraboloidal manifolds
Authors:
Michael B. Law
Abstract:
We construct and classify all polynomial growth solutions to certain drift-harmonic equations on complete manifolds with paraboloidal asymptotics. These encompass the natural drift-harmonic equations on certain steady gradient Ricci solitons. Specifically, we show that all drift-harmonic functions with polynomial growth asymptotically separate variables, and compute the dimensions of spaces of dri…
▽ More
We construct and classify all polynomial growth solutions to certain drift-harmonic equations on complete manifolds with paraboloidal asymptotics. These encompass the natural drift-harmonic equations on certain steady gradient Ricci solitons. Specifically, we show that all drift-harmonic functions with polynomial growth asymptotically separate variables, and compute the dimensions of spaces of drift-harmonic functions with a given polynomial growth rate. The proof uses an inductive argument that alternates between constructing and asymptotically controlling drift-harmonic functions.
△ Less
Submitted 13 February, 2025; v1 submitted 9 January, 2025;
originally announced January 2025.
-
Neural Spacetimes for DAG Representation Learning
Authors:
Haitz Sáez de Ocáriz Borde,
Anastasis Kratsios,
Marc T. Law,
Xiaowen Dong,
Michael Bronstein
Abstract:
We propose a class of trainable deep learning-based geometries called Neural Spacetimes (NSTs), which can universally represent nodes in weighted directed acyclic graphs (DAGs) as events in a spacetime manifold. While most works in the literature focus on undirected graph representation learning or causality embedding separately, our differentiable geometry can encode both graph edge weights in it…
▽ More
We propose a class of trainable deep learning-based geometries called Neural Spacetimes (NSTs), which can universally represent nodes in weighted directed acyclic graphs (DAGs) as events in a spacetime manifold. While most works in the literature focus on undirected graph representation learning or causality embedding separately, our differentiable geometry can encode both graph edge weights in its spatial dimensions and causality in the form of edge directionality in its temporal dimensions. We use a product manifold that combines a quasi-metric (for space) and a partial order (for time). NSTs are implemented as three neural networks trained in an end-to-end manner: an embedding network, which learns to optimize the location of nodes as events in the spacetime manifold, and two other networks that optimize the space and time geometries in parallel, which we call a neural (quasi-)metric and a neural partial order, respectively. The latter two networks leverage recent ideas at the intersection of fractal geometry and deep learning to shape the geometry of the representation space in a data-driven fashion, unlike other works in the literature that use fixed spacetime manifolds such as Minkowski space or De Sitter space to embed DAGs. Our main theoretical guarantee is a universal embedding theorem, showing that any $k$-point DAG can be embedded into an NST with $1+\mathcal{O}(\log(k))$ distortion while exactly preserving its causal structure. The total number of parameters defining the NST is sub-cubic in $k$ and linear in the width of the DAG. If the DAG has a planar Hasse diagram, this is improved to $\mathcal{O}(\log(k)) + 2)$ spatial and 2 temporal dimensions. We validate our framework computationally with synthetic weighted DAGs and real-world network embeddings; in both cases, the NSTs achieve lower embedding distortions than their counterparts using fixed spacetime geometries.
△ Less
Submitted 9 March, 2025; v1 submitted 25 August, 2024;
originally announced August 2024.
-
Concavity for elliptic and parabolic equations in locally symmetric spaces with nonnegative curvature
Authors:
Shrey Aryan,
Michael B. Law
Abstract:
We establish a concavity principle for solutions to elliptic and parabolic equations on locally symmetric spaces with nonnegative sectional curvature, extending the results of Langford and Scheuer. To the best of our knowledge, this is the first general concavity principle established on spaces with non-constant sectional curvature.
We establish a concavity principle for solutions to elliptic and parabolic equations on locally symmetric spaces with nonnegative sectional curvature, extending the results of Langford and Scheuer. To the best of our knowledge, this is the first general concavity principle established on spaces with non-constant sectional curvature.
△ Less
Submitted 20 March, 2025; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Approximation Rates and VC-Dimension Bounds for (P)ReLU MLP Mixture of Experts
Authors:
Anastasis Kratsios,
Haitz Sáez de Ocáriz Borde,
Takashi Furuya,
Marc T. Law
Abstract:
Mixture-of-Experts (MoEs) can scale up beyond traditional deep learning models by employing a routing strategy in which each input is processed by a single "expert" deep learning model. This strategy allows us to scale up the number of parameters defining the MoE while maintaining sparse activation, i.e., MoEs only load a small number of their total parameters into GPU VRAM for the forward pass de…
▽ More
Mixture-of-Experts (MoEs) can scale up beyond traditional deep learning models by employing a routing strategy in which each input is processed by a single "expert" deep learning model. This strategy allows us to scale up the number of parameters defining the MoE while maintaining sparse activation, i.e., MoEs only load a small number of their total parameters into GPU VRAM for the forward pass depending on the input. In this paper, we provide an approximation and learning-theoretic analysis of mixtures of expert MLPs with (P)ReLU activation functions. We first prove that for every error level $\varepsilon>0$ and every Lipschitz function $f:[0,1]^n\to \mathbb{R}$, one can construct a MoMLP model (a Mixture-of-Experts comprising of (P)ReLU MLPs) which uniformly approximates $f$ to $\varepsilon$ accuracy over $[0,1]^n$, while only requiring networks of $\mathcal{O}(\varepsilon^{-1})$ parameters to be loaded in memory. Additionally, we show that MoMLPs can generalize since the entire MoMLP model has a (finite) VC dimension of $\tilde{O}(L\max\{nL,JW\})$, if there are $L$ experts and each expert has a depth and width of $J$ and $W$, respectively.
△ Less
Submitted 25 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Positive mass and Dirac operators on weighted manifolds and smooth metric measure spaces
Authors:
Michael B. Law,
Isaac M. Lopez,
Daniel Santiago
Abstract:
We establish a weighted positive mass theorem which unifies and generalizes results of Baldauf--Ozuch and Chu--Zhu. Our result is in fact equivalent to the usual positive mass theorem, and can be regarded as a positive mass theorem for smooth metric measure spaces. We also study Dirac operators on certain warped product manifolds associated to smooth metric measure spaces. Applications of this inc…
▽ More
We establish a weighted positive mass theorem which unifies and generalizes results of Baldauf--Ozuch and Chu--Zhu. Our result is in fact equivalent to the usual positive mass theorem, and can be regarded as a positive mass theorem for smooth metric measure spaces. We also study Dirac operators on certain warped product manifolds associated to smooth metric measure spaces. Applications of this include, among others, an alternative proof for a special case of our positive mass theorem, eigenvalue bounds for the Dirac operator on closed spin manifolds, and a new way to understand the weighted Dirac operator using warped products.
△ Less
Submitted 29 February, 2024; v1 submitted 24 December, 2023;
originally announced December 2023.
-
Distributional Robustness and Transfer Learning Through Empirical Bayes
Authors:
Michael Law,
Peter Bühlmann,
Ya'acov Ritov
Abstract:
We consider the problem of statistical inference on parameters of a target population when auxiliary observations are available from related populations. We propose a flexible empirical Bayes approach that can be applied on top of any asymptotically linear estimator to incorporate information from related populations when constructing confidence regions. The proposed methodology is valid regardles…
▽ More
We consider the problem of statistical inference on parameters of a target population when auxiliary observations are available from related populations. We propose a flexible empirical Bayes approach that can be applied on top of any asymptotically linear estimator to incorporate information from related populations when constructing confidence regions. The proposed methodology is valid regardless of whether there are direct observations on the population of interest. We demonstrate the performance of the empirical Bayes confidence regions on synthetic data as well as on the Trends in International Mathematics and Sciences Study when using the debiased Lasso as the basic algorithm in high-dimensional regression.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Rank-Constrained Least-Squares: Prediction and Inference
Authors:
Michael Law,
Ya'acov Ritov,
Ruixiang Zhang,
Ziwei Zhu
Abstract:
In this work, we focus on the high-dimensional trace regression model with a low-rank coefficient matrix. We establish a nearly optimal in-sample prediction risk bound for the rank-constrained least-squares estimator under no assumptions on the design matrix. Lying at the heart of the proof is a covering number bound for the family of projection operators corresponding to the subspaces spanned by…
▽ More
In this work, we focus on the high-dimensional trace regression model with a low-rank coefficient matrix. We establish a nearly optimal in-sample prediction risk bound for the rank-constrained least-squares estimator under no assumptions on the design matrix. Lying at the heart of the proof is a covering number bound for the family of projection operators corresponding to the subspaces spanned by the design. By leveraging this complexity result, we perform a power analysis for a permutation test on the existence of a low-rank signal under the high-dimensional trace regression model. We show that the permutation test based on the rank-constrained least-squares estimator achieves non-trivial power with no assumptions on the minimum (restricted) eigenvalue of the covariance matrix of the design. Finally, we use alternating minimization to approximately solve the rank-constrained least-squares problem to evaluate its empirical in-sample prediction risk and power of the resulting permutation test in our numerical study.
△ Less
Submitted 17 April, 2022; v1 submitted 28 November, 2021;
originally announced November 2021.
-
High-Dimensional Varying Coefficient Models with Functional Random Effects
Authors:
Michael Law,
Ya'acov Ritov
Abstract:
We consider a sparse high-dimensional varying coefficients model with random effects, a flexible linear model allowing covariates and coefficients to have a functional dependence with time. For each individual, we observe discretely sampled responses and covariates as a function of time as well as time invariant covariates. Under sampling times that are either fixed and common or random and indepe…
▽ More
We consider a sparse high-dimensional varying coefficients model with random effects, a flexible linear model allowing covariates and coefficients to have a functional dependence with time. For each individual, we observe discretely sampled responses and covariates as a function of time as well as time invariant covariates. Under sampling times that are either fixed and common or random and independent amongst individuals, we propose a projection procedure for the empirical estimation of all varying coefficients. We extend this estimator to construct confidence bands for a fixed number of varying coefficients.
△ Less
Submitted 12 October, 2021;
originally announced October 2021.
-
Low Budget Active Learning via Wasserstein Distance: An Integer Programming Approach
Authors:
Rafid Mahmood,
Sanja Fidler,
Marc T. Law
Abstract:
Active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label. The large scale of data sets used in deep learning forces most sample selection strategies to employ efficient heuristics. This paper introduces an integer optimization problem for selecting a core set that minimizes the discrete Wasserstein distance from the…
▽ More
Active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label. The large scale of data sets used in deep learning forces most sample selection strategies to employ efficient heuristics. This paper introduces an integer optimization problem for selecting a core set that minimizes the discrete Wasserstein distance from the unlabeled pool. We demonstrate that this problem can be tractably solved with a Generalized Benders Decomposition algorithm. Our strategy uses high-quality latent features that can be obtained by unsupervised learning on the unlabeled pool. Numerical results on several data sets show that our optimization approach is competitive with baselines and particularly outperforms them in the low budget regime where less than one percent of the data set is labeled.
△ Less
Submitted 6 March, 2023; v1 submitted 5 June, 2021;
originally announced June 2021.
-
Estimating the Random Effect in Big Data Mixed Models
Authors:
Michael Law,
Ya'acov Ritov
Abstract:
We consider three problems in high-dimensional Gaussian linear mixed models. Without any assumptions on the design for the fixed effects, we construct an asymptotic $F$-statistic for testing whether a collection of random effects is zero, derive an asymptotic confidence interval for a single random effect at the parametric rate $\sqrt{n}$, and propose an empirical Bayes estimator for a part of the…
▽ More
We consider three problems in high-dimensional Gaussian linear mixed models. Without any assumptions on the design for the fixed effects, we construct an asymptotic $F$-statistic for testing whether a collection of random effects is zero, derive an asymptotic confidence interval for a single random effect at the parametric rate $\sqrt{n}$, and propose an empirical Bayes estimator for a part of the mean vector in ANOVA type models that performs asymptotically as well as the oracle Bayes estimator. We support our results with numerical simulations and provide comparisons with oracle estimators. The procedures developed are applied to the Trends in International Mathematics and Sciences Study (TIMSS) data.
△ Less
Submitted 27 July, 2019;
originally announced July 2019.
-
Inference Without Compatibility
Authors:
Michael Law,
Ya'acov Ritov
Abstract:
We consider hypotheses testing problems for three parameters in high-dimensional linear models with minimal sparsity assumptions of their type but without any compatibility conditions. Under this framework, we construct the first $\sqrt{n}$-consistent estimators for low-dimensional coefficients, the signal strength, and the noise level. We support our results using numerical simulations and provid…
▽ More
We consider hypotheses testing problems for three parameters in high-dimensional linear models with minimal sparsity assumptions of their type but without any compatibility conditions. Under this framework, we construct the first $\sqrt{n}$-consistent estimators for low-dimensional coefficients, the signal strength, and the noise level. We support our results using numerical simulations and provide comparisons with other estimators.
△ Less
Submitted 21 January, 2020; v1 submitted 14 March, 2019;
originally announced March 2019.
-
Linear spaces with a line-transitive point-imprimitive automorphism group and Fang-Li parameter gcd(k,r) at most eight
Authors:
Anton Betten,
Anne Delandtsheer,
Maska Law,
Alice C. Niemeyer,
Cheryl E. Praeger,
Shenglin Zhou
Abstract:
In 1991, Weidong Fang and Huiling Li proved that there are only finitely many non-trivial linear spaces that admit a line-transitive, point-imprimitive group action, for a given value of gcd(k,r), where k is the line size and r is the number of lines on a point. The aim of this paper is to make that result effective. We obtain a classification of all linear spaces with this property having gcd(k…
▽ More
In 1991, Weidong Fang and Huiling Li proved that there are only finitely many non-trivial linear spaces that admit a line-transitive, point-imprimitive group action, for a given value of gcd(k,r), where k is the line size and r is the number of lines on a point. The aim of this paper is to make that result effective. We obtain a classification of all linear spaces with this property having gcd(k,r) at most 8. To achieve this we collect together existing theory, and prove additional theoretical restrictions of both a combinatorial and group theoretic nature. These are organised into a series of algorithms that, for gcd(k,r) up to a given maximum value, return a list of candidate parameter values and candidate groups. We examine in detail each of the possibilities returned by these algorithms for gcd(k,r) at most 8, and complete the classification in this case.
△ Less
Submitted 31 January, 2007; v1 submitted 23 January, 2007;
originally announced January 2007.