-
Estimating Uncertainty with Implicit Quantile Network
Authors:
Yi Hung Lim
Abstract:
Uncertainty quantification is an important part of many performance critical applications. This paper provides a simple alternative to existing approaches such as ensemble learning and bayesian neural networks. By directly modeling the loss distribution with an Implicit Quantile Network, we get an estimate of how uncertain the model is of its predictions. For experiments with MNIST and CIFAR datas…
▽ More
Uncertainty quantification is an important part of many performance critical applications. This paper provides a simple alternative to existing approaches such as ensemble learning and bayesian neural networks. By directly modeling the loss distribution with an Implicit Quantile Network, we get an estimate of how uncertain the model is of its predictions. For experiments with MNIST and CIFAR datasets, the mean of the estimated loss distribution is 2x higher for incorrect predictions. When data with high estimated uncertainty is removed from the test dataset, the accuracy of the model goes up as much as 10%. This method is simple to implement while offering important information to applications where the user has to know when the model could be wrong (e.g. deep learning for healthcare).
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
Parallelizing non-linear sequential models over the sequence length
Authors:
Yi Heng Lim,
Qi Zhu,
Joshua Selfridge,
Muhammad Firmansyah Kasim
Abstract:
Sequential models, such as Recurrent Neural Networks and Neural Ordinary Differential Equations, have long suffered from slow training due to their inherent sequential nature. For many years this bottleneck has persisted, as many thought sequential models could not be parallelized. We challenge this long-held belief with our parallel algorithm that accelerates GPU evaluation of sequential models b…
▽ More
Sequential models, such as Recurrent Neural Networks and Neural Ordinary Differential Equations, have long suffered from slow training due to their inherent sequential nature. For many years this bottleneck has persisted, as many thought sequential models could not be parallelized. We challenge this long-held belief with our parallel algorithm that accelerates GPU evaluation of sequential models by up to 3 orders of magnitude faster without compromising output accuracy. The algorithm does not need any special structure in the sequential models' architecture, making it applicable to a wide range of architectures. Using our method, training sequential models can be more than 10 times faster than the common sequential method without any meaningful difference in the training results. Leveraging this accelerated training, we discovered the efficacy of the Gated Recurrent Unit in a long time series classification problem with 17k time samples. By overcoming the training bottleneck, our work serves as the first step to unlock the potential of non-linear sequential models for long sequence problems.
△ Less
Submitted 16 January, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Constants of motion network
Authors:
Muhammad Firmansyah Kasim,
Yi Heng Lim
Abstract:
The beauty of physics is that there is usually a conserved quantity in an always-changing system, known as the constant of motion. Finding the constant of motion is important in understanding the dynamics of the system, but typically requires mathematical proficiency and manual analytical work. In this paper, we present a neural network that can simultaneously learn the dynamics of the system and…
▽ More
The beauty of physics is that there is usually a conserved quantity in an always-changing system, known as the constant of motion. Finding the constant of motion is important in understanding the dynamics of the system, but typically requires mathematical proficiency and manual analytical work. In this paper, we present a neural network that can simultaneously learn the dynamics of the system and the constants of motion from data. By exploiting the discovered constants of motion, it can produce better predictions on dynamics and can work on a wider range of systems than Hamiltonian-based neural networks. In addition, the training progresses of our method can be used as an indication of the number of constants of motion in a system which could be useful in studying a novel physical system.
△ Less
Submitted 4 October, 2022; v1 submitted 22 August, 2022;
originally announced August 2022.
-
Unifying physical systems' inductive biases in neural ODE using dynamics constraints
Authors:
Yi Heng Lim,
Muhammad Firmansyah Kasim
Abstract:
Conservation of energy is at the core of many physical phenomena and dynamical systems. There have been a significant number of works in the past few years aimed at predicting the trajectory of motion of dynamical systems using neural networks while adhering to the law of conservation of energy. Most of these works are inspired by classical mechanics such as Hamiltonian and Lagrangian mechanics as…
▽ More
Conservation of energy is at the core of many physical phenomena and dynamical systems. There have been a significant number of works in the past few years aimed at predicting the trajectory of motion of dynamical systems using neural networks while adhering to the law of conservation of energy. Most of these works are inspired by classical mechanics such as Hamiltonian and Lagrangian mechanics as well as Neural Ordinary Differential Equations. While these works have been shown to work well in specific domains respectively, there is a lack of a unifying method that is more generally applicable without requiring significant changes to the neural network architectures. In this work, we aim to address this issue by providing a simple method that could be applied to not just energy-conserving systems, but also dissipative systems, by including a different inductive bias in different cases in the form of a regularisation term in the loss function. The proposed method does not require changing the neural network architecture and could form the basis to validate a novel idea, therefore showing promises to accelerate research in this direction.
△ Less
Submitted 3 August, 2022;
originally announced August 2022.
-
Reducing the Long Tail Losses in Scientific Emulations with Active Learning
Authors:
Yi Heng Lim,
Muhammad Firmansyah Kasim
Abstract:
Deep-learning-based models are increasingly used to emulate scientific simulations to accelerate scientific research. However, accurate, supervised deep learning models require huge amount of labelled data, and that often becomes the bottleneck in employing neural networks. In this work, we leveraged an active learning approach called core-set selection to actively select data, per a pre-defined b…
▽ More
Deep-learning-based models are increasingly used to emulate scientific simulations to accelerate scientific research. However, accurate, supervised deep learning models require huge amount of labelled data, and that often becomes the bottleneck in employing neural networks. In this work, we leveraged an active learning approach called core-set selection to actively select data, per a pre-defined budget, to be labelled for training. To further improve the model performance and reduce the training costs, we also warm started the training using a shrink-and-perturb trick. We tested on two case studies in different fields, namely galaxy halo occupation distribution modelling in astrophysics and x-ray emission spectroscopy in plasma physics, and the results are promising: we achieved competitive overall performance compared to using a random sampling baseline, and more importantly, successfully reduced the larger absolute losses, i.e. the long tail in the loss distribution, at virtually no overhead costs.
△ Less
Submitted 9 January, 2022; v1 submitted 15 November, 2021;
originally announced November 2021.
-
Compression and Symmetry of Small-World Graphs and Structures
Authors:
Ioannis Kontoyiannis,
Yi Heng Lim,
Katia Papakonstantinopoulou,
Wojtek Szpankowski
Abstract:
For various purposes and, in particular, in the context of data compression, a graph can be examined at three levels. Its structure can be described as the unlabeled version of the graph; then the labeling of its structure can be added; and finally, given then structure and labeling, the contents of the labels can be described. Determining the amount of information present at each level and quanti…
▽ More
For various purposes and, in particular, in the context of data compression, a graph can be examined at three levels. Its structure can be described as the unlabeled version of the graph; then the labeling of its structure can be added; and finally, given then structure and labeling, the contents of the labels can be described. Determining the amount of information present at each level and quantifying the degree of dependence between them, requires the study of symmetry, graph automorphism, entropy, and graph compressibility. In this paper, we focus on a class of small-world graphs. These are geometric random graphs where vertices are first connected to their nearest neighbors on a circle and then pairs of non-neighbors are connected according to a distance-dependent probability distribution. We establish the degree distribution of this model, and use it to prove the model's asymmetry in an appropriate range of parameters. Then we derive the relevant entropy and structural entropy of these random graphs, in connection with graph compression.
△ Less
Submitted 22 November, 2021; v1 submitted 31 July, 2020;
originally announced July 2020.