Search | arXiv e-print repository

On the optimization of discrepancy measures

Authors: François Clément, Nathan Kirk, Art B. Owen, T. Konstantin Rusch

Abstract: Points in the unit cube with low discrepancy can be constructed using algebra or, more recently, by direct computational optimization of a criterion. The usual $L_\infty$ star discrepancy is a poor criterion for this because it is computationally expensive and lacks differentiability. Its usual replacement, the $L_2$ star discrepancy, is smooth but exhibits other pathologies shown by J. Matoušek.… ▽ More Points in the unit cube with low discrepancy can be constructed using algebra or, more recently, by direct computational optimization of a criterion. The usual $L_\infty$ star discrepancy is a poor criterion for this because it is computationally expensive and lacks differentiability. Its usual replacement, the $L_2$ star discrepancy, is smooth but exhibits other pathologies shown by J. Matoušek. In an attempt to address these problems, we introduce the \textit{average squared discrepancy} which averages over $2^d$ versions of the $L_2$ star discrepancy anchored in the different vertices of $[0,1]^d$. Not only can this criterion be computed in $O(dn^2)$ time, like the $L_2$ star discrepancy, but also we show that it is equivalent to a weighted symmetric $L_2$ criterion of Hickernell's by a constant factor. We compare this criterion with a wide range of traditional discrepancy measures, and show that only the average squared discrepancy avoids the problems raised by Matoušek. Furthermore, we present a comprehensive numerical study showing in particular that optimizing for the average squared discrepancy leads to strong performance for the $L_2$ star discrepancy, whereas the converse does not hold. △ Less

Submitted 6 August, 2025; originally announced August 2025.

Comments: 22 pages, 3 Figures, 4 Tables

arXiv:2503.21103 [pdf, other]

Low Stein Discrepancy via Message-Passing Monte Carlo

Authors: Nathan Kirk, T. Konstantin Rusch, Jakob Zech, Daniela Rus

Abstract: Message-Passing Monte Carlo (MPMC) was recently introduced as a novel low-discrepancy sampling approach leveraging tools from geometric deep learning. While originally designed for generating uniform point sets, we extend this framework to sample from general multivariate probability distributions with known probability density function. Our proposed method, Stein-Message-Passing Monte Carlo (Stei… ▽ More Message-Passing Monte Carlo (MPMC) was recently introduced as a novel low-discrepancy sampling approach leveraging tools from geometric deep learning. While originally designed for generating uniform point sets, we extend this framework to sample from general multivariate probability distributions with known probability density function. Our proposed method, Stein-Message-Passing Monte Carlo (Stein-MPMC), minimizes a kernelized Stein discrepancy, ensuring improved sample quality. Finally, we show that Stein-MPMC outperforms competing methods, such as Stein Variational Gradient Descent and (greedy) Stein Points, by achieving a lower Stein discrepancy. △ Less

Submitted 26 March, 2025; originally announced March 2025.

Comments: 8 pages, 2 figures, Accepted at the ICLR 2025 Workshop on Frontiers in Probabilistic Inference

arXiv:2405.15059 [pdf, other]

doi 10.1073/pnas.2409913121

Message-Passing Monte Carlo: Generating low-discrepancy point sets via Graph Neural Networks

Authors: T. Konstantin Rusch, Nathan Kirk, Michael M. Bronstein, Christiane Lemieux, Daniela Rus

Abstract: Discrepancy is a well-known measure for the irregularity of the distribution of a point set. Point sets with small discrepancy are called low-discrepancy and are known to efficiently fill the space in a uniform manner. Low-discrepancy points play a central role in many problems in science and engineering, including numerical integration, computer vision, machine perception, computer graphics, mach… ▽ More Discrepancy is a well-known measure for the irregularity of the distribution of a point set. Point sets with small discrepancy are called low-discrepancy and are known to efficiently fill the space in a uniform manner. Low-discrepancy points play a central role in many problems in science and engineering, including numerical integration, computer vision, machine perception, computer graphics, machine learning, and simulation. In this work, we present the first machine learning approach to generate a new class of low-discrepancy point sets named Message-Passing Monte Carlo (MPMC) points. Motivated by the geometric nature of generating low-discrepancy point sets, we leverage tools from Geometric Deep Learning and base our model on Graph Neural Networks. We further provide an extension of our framework to higher dimensions, which flexibly allows the generation of custom-made points that emphasize the uniformity in specific dimensions that are primarily important for the particular problem at hand. Finally, we demonstrate that our proposed model achieves state-of-the-art performance superior to previous methods by a significant margin. In fact, MPMC points are empirically shown to be either optimal or near-optimal with respect to the discrepancy for low dimension and small number of points, i.e., for which the optimal discrepancy can be determined. Code for generating MPMC points can be found at https://github.com/tk-rusch/MPMC. △ Less

Submitted 26 September, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: Published in Proceedings of the National Academy of Sciences (PNAS): https://www.pnas.org/doi/10.1073/pnas.2409913121

arXiv:2302.03580 [pdf, other]

Multi-Scale Message Passing Neural PDE Solvers

Authors: Léonard Equer, T. Konstantin Rusch, Siddhartha Mishra

Abstract: We propose a novel multi-scale message passing neural network algorithm for learning the solutions of time-dependent PDEs. Our algorithm possesses both temporal and spatial multi-scale resolution features by incorporating multi-scale sequence models and graph gating modules in the encoder and processor, respectively. Benchmark numerical experiments are presented to demonstrate that the proposed al… ▽ More We propose a novel multi-scale message passing neural network algorithm for learning the solutions of time-dependent PDEs. Our algorithm possesses both temporal and spatial multi-scale resolution features by incorporating multi-scale sequence models and graph gating modules in the encoder and processor, respectively. Benchmark numerical experiments are presented to demonstrate that the proposed algorithm outperforms baselines, particularly on a PDE with a range of spatial and temporal scales. △ Less

Submitted 7 February, 2023; originally announced February 2023.

arXiv:2202.02296 [pdf, other]

Graph-Coupled Oscillator Networks

Authors: T. Konstantin Rusch, Benjamin P. Chamberlain, James Rowbottom, Siddhartha Mishra, Michael M. Bronstein

Abstract: We propose Graph-Coupled Oscillator Networks (GraphCON), a novel framework for deep learning on graphs. It is based on discretizations of a second-order system of ordinary differential equations (ODEs), which model a network of nonlinear controlled and damped oscillators, coupled via the adjacency structure of the underlying graph. The flexibility of our framework permits any basic GNN layer (e.g.… ▽ More We propose Graph-Coupled Oscillator Networks (GraphCON), a novel framework for deep learning on graphs. It is based on discretizations of a second-order system of ordinary differential equations (ODEs), which model a network of nonlinear controlled and damped oscillators, coupled via the adjacency structure of the underlying graph. The flexibility of our framework permits any basic GNN layer (e.g. convolutional or attentional) as the coupling function, from which a multi-layer deep neural network is built up via the dynamics of the proposed ODEs. We relate the oversmoothing problem, commonly encountered in GNNs, to the stability of steady states of the underlying ODE and show that zero-Dirichlet energy steady states are not stable for our proposed ODEs. This demonstrates that the proposed framework mitigates the oversmoothing problem. Moreover, we prove that GraphCON mitigates the exploding and vanishing gradients problem to facilitate training of deep multi-layer GNNs. Finally, we show that our approach offers competitive performance with respect to the state-of-the-art on a variety of graph-based learning tasks. △ Less

Submitted 23 June, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

Comments: ICML 2022

arXiv:2110.04744 [pdf, other]

Long Expressive Memory for Sequence Modeling

Authors: T. Konstantin Rusch, Siddhartha Mishra, N. Benjamin Erichson, Michael W. Mahoney

Abstract: We propose a novel method called Long Expressive Memory (LEM) for learning long-term sequential dependencies. LEM is gradient-based, it can efficiently process sequential tasks with very long-term dependencies, and it is sufficiently expressive to be able to learn complicated input-output maps. To derive LEM, we consider a system of multiscale ordinary differential equations, as well as a suitable… ▽ More We propose a novel method called Long Expressive Memory (LEM) for learning long-term sequential dependencies. LEM is gradient-based, it can efficiently process sequential tasks with very long-term dependencies, and it is sufficiently expressive to be able to learn complicated input-output maps. To derive LEM, we consider a system of multiscale ordinary differential equations, as well as a suitable time-discretization of this system. For LEM, we derive rigorous bounds to show the mitigation of the exploding and vanishing gradients problem, a well-known challenge for gradient-based recurrent sequential learning methods. We also prove that LEM can approximate a large class of dynamical systems to high accuracy. Our empirical results, ranging from image and time-series classification through dynamical systems prediction to speech recognition and language modeling, demonstrate that LEM outperforms state-of-the-art recurrent neural networks, gated recurrent units, and long short-term memory models. △ Less

Submitted 25 February, 2022; v1 submitted 10 October, 2021; originally announced October 2021.

Comments: ICLR 2022

arXiv:2103.05487 [pdf, other]

UnICORNN: A recurrent model for learning very long time dependencies

Authors: T. Konstantin Rusch, Siddhartha Mishra

Abstract: The design of recurrent neural networks (RNNs) to accurately process sequential inputs with long-time dependencies is very challenging on account of the exploding and vanishing gradient problem. To overcome this, we propose a novel RNN architecture which is based on a structure preserving discretization of a Hamiltonian system of second-order ordinary differential equations that models networks of… ▽ More The design of recurrent neural networks (RNNs) to accurately process sequential inputs with long-time dependencies is very challenging on account of the exploding and vanishing gradient problem. To overcome this, we propose a novel RNN architecture which is based on a structure preserving discretization of a Hamiltonian system of second-order ordinary differential equations that models networks of oscillators. The resulting RNN is fast, invertible (in time), memory efficient and we derive rigorous bounds on the hidden state gradients to prove the mitigation of the exploding and vanishing gradient problem. A suite of experiments are presented to demonstrate that the proposed RNN provides state of the art performance on a variety of learning tasks with (very) long-time dependencies. △ Less

Submitted 10 June, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

Report number: PMLR 139:9168-9178, 2021

arXiv:2009.02713 [pdf, other]

Higher-order Quasi-Monte Carlo Training of Deep Neural Networks

Authors: M. Longo, S. Mishra, T. K. Rusch, Ch. Schwab

Abstract: We present a novel algorithmic approach and an error analysis leveraging Quasi-Monte Carlo points for training deep neural network (DNN) surrogates of Data-to-Observable (DtO) maps in engineering design. Our analysis reveals higher-order consistent, deterministic choices of training points in the input data space for deep and shallow Neural Networks with holomorphic activation functions such as ta… ▽ More We present a novel algorithmic approach and an error analysis leveraging Quasi-Monte Carlo points for training deep neural network (DNN) surrogates of Data-to-Observable (DtO) maps in engineering design. Our analysis reveals higher-order consistent, deterministic choices of training points in the input data space for deep and shallow Neural Networks with holomorphic activation functions such as tanh. These novel training points are proved to facilitate higher-order decay (in terms of the number of training samples) of the underlying generalization error, with consistency error bounds that are free from the curse of dimensionality in the input data space, provided that DNN weights in hidden layers satisfy certain summability conditions. We present numerical experiments for DtO maps from elliptic and parabolic PDEs with uncertain inputs that confirm the theoretical analysis. △ Less

Submitted 6 September, 2020; originally announced September 2020.

arXiv:2005.12564 [pdf, other]

Enhancing accuracy of deep learning algorithms by training with low-discrepancy sequences

Authors: Siddhartha Mishra, T. Konstantin Rusch

Abstract: We propose a deep supervised learning algorithm based on low-discrepancy sequences as the training set. By a combination of theoretical arguments and extensive numerical experiments we demonstrate that the proposed algorithm significantly outperforms standard deep learning algorithms that are based on randomly chosen training data, for problems in moderately high dimensions. The proposed algorithm… ▽ More We propose a deep supervised learning algorithm based on low-discrepancy sequences as the training set. By a combination of theoretical arguments and extensive numerical experiments we demonstrate that the proposed algorithm significantly outperforms standard deep learning algorithms that are based on randomly chosen training data, for problems in moderately high dimensions. The proposed algorithm provides an efficient method for building inexpensive surrogates for many underlying maps in the context of scientific computing. △ Less

Submitted 26 May, 2020; originally announced May 2020.

Showing 1–9 of 9 results for author: Rusch, T K