-
Spatial-aware decision-making with ring attractors in reinforcement learning systems
Authors:
Marcos Negre Saura,
Richard Allmendinger,
Wei Pan,
Theodore Papamarkou
Abstract:
This paper explores the integration of ring attractors, a mathematical model inspired by neural circuit dynamics, into the Reinforcement Learning (RL) action selection process. Serving as specialized brain-inspired structures that encode spatial information and uncertainty, ring attractors offer a biologically plausible mechanism to improve learning speed and accuracy in RL. They do so by explicit…
▽ More
This paper explores the integration of ring attractors, a mathematical model inspired by neural circuit dynamics, into the Reinforcement Learning (RL) action selection process. Serving as specialized brain-inspired structures that encode spatial information and uncertainty, ring attractors offer a biologically plausible mechanism to improve learning speed and accuracy in RL. They do so by explicitly encoding the action space, facilitating the organization of neural activity, and enabling the distribution of spatial representations across the neural network in the context of Deep Reinforcement Learning (DRL). For example, preserving the continuity between rotation angles in robotic control or adjacency between tactical moves in game-like environments. The application of ring attractors in the action selection process involves mapping actions to specific locations on the ring and decoding the selected action based on neural activity. We investigate the application of ring attractors by both building an exogenous model and integrating them as part of DRL agents. Our approach significantly improves state-of-the-art performance on the Atari 100k benchmark, achieving a 53\% increase in performance across selected state-of-the-art baselines. Codebase available at https://anonymous.4open.science/r/RA_RL-8026.
△ Less
Submitted 14 February, 2025; v1 submitted 3 October, 2024;
originally announced October 2024.
-
ICML Topological Deep Learning Challenge 2024: Beyond the Graph Domain
Authors:
Guillermo Bernárdez,
Lev Telyatnikov,
Marco Montagna,
Federica Baccini,
Mathilde Papillon,
Miquel Ferriol-Galmés,
Mustafa Hajij,
Theodore Papamarkou,
Maria Sofia Bucarelli,
Olga Zaghen,
Johan Mathe,
Audun Myers,
Scott Mahan,
Hansen Lillemark,
Sharvaree Vadgama,
Erik Bekkers,
Tim Doster,
Tegan Emerson,
Henry Kvinge,
Katrina Agate,
Nesreen K Ahmed,
Pengfei Bai,
Michael Banf,
Claudio Battiloro,
Maxim Beketov
, et al. (48 additional authors not shown)
Abstract:
This paper describes the 2nd edition of the ICML Topological Deep Learning Challenge that was hosted within the ICML 2024 ELLIS Workshop on Geometry-grounded Representation Learning and Generative Modeling (GRaM). The challenge focused on the problem of representing data in different discrete topological domains in order to bridge the gap between Topological Deep Learning (TDL) and other types of…
▽ More
This paper describes the 2nd edition of the ICML Topological Deep Learning Challenge that was hosted within the ICML 2024 ELLIS Workshop on Geometry-grounded Representation Learning and Generative Modeling (GRaM). The challenge focused on the problem of representing data in different discrete topological domains in order to bridge the gap between Topological Deep Learning (TDL) and other types of structured datasets (e.g. point clouds, graphs). Specifically, participants were asked to design and implement topological liftings, i.e. mappings between different data structures and topological domains --like hypergraphs, or simplicial/cell/combinatorial complexes. The challenge received 52 submissions satisfying all the requirements. This paper introduces the main scope of the challenge, and summarizes the main results and findings.
△ Less
Submitted 8 September, 2024;
originally announced September 2024.
-
TopoBench: A Framework for Benchmarking Topological Deep Learning
Authors:
Lev Telyatnikov,
Guillermo Bernardez,
Marco Montagna,
Mustafa Hajij,
Martin Carrasco,
Pavlo Vasylenko,
Mathilde Papillon,
Ghada Zamzmi,
Michael T. Schaub,
Jonas Verhellen,
Pavel Snopov,
Bertran Miquel-Oliver,
Manel Gil-Sorribes,
Alexis Molina,
Victor Guallar,
Theodore Long,
Julian Suk,
Patryk Rygiel,
Alexander Nikitin,
Giordan Escalona,
Michael Banf,
Dominik Filipiak,
Max Schattauer,
Liliya Imasheva,
Alvaro Martinez
, et al. (12 additional authors not shown)
Abstract:
This work introduces TopoBench, an open-source library designed to standardize benchmarking and accelerate research in topological deep learning (TDL). TopoBench decomposes TDL into a sequence of independent modules for data generation, loading, transforming and processing, as well as model training, optimization and evaluation. This modular organization provides flexibility for modifications and…
▽ More
This work introduces TopoBench, an open-source library designed to standardize benchmarking and accelerate research in topological deep learning (TDL). TopoBench decomposes TDL into a sequence of independent modules for data generation, loading, transforming and processing, as well as model training, optimization and evaluation. This modular organization provides flexibility for modifications and facilitates the adaptation and optimization of various TDL pipelines. A key feature of TopoBench is its support for transformations and lifting across topological domains. Mapping the topology and features of a graph to higher-order topological domains, such as simplicial and cell complexes, enables richer data representations and more fine-grained analyses. The applicability of TopoBench is demonstrated by benchmarking several TDL architectures across diverse tasks and datasets.
△ Less
Submitted 26 March, 2025; v1 submitted 9 June, 2024;
originally announced June 2024.
-
Position: Topological Deep Learning is the New Frontier for Relational Learning
Authors:
Theodore Papamarkou,
Tolga Birdal,
Michael Bronstein,
Gunnar Carlsson,
Justin Curry,
Yue Gao,
Mustafa Hajij,
Roland Kwitt,
Pietro Liò,
Paolo Di Lorenzo,
Vasileios Maroulas,
Nina Miolane,
Farzana Nasrin,
Karthikeyan Natesan Ramamurthy,
Bastian Rieck,
Simone Scardapane,
Michael T. Schaub,
Petar Veličković,
Bei Wang,
Yusu Wang,
Guo-Wei Wei,
Ghada Zamzmi
Abstract:
Topological deep learning (TDL) is a rapidly evolving field that uses topological features to understand and design deep learning models. This paper posits that TDL is the new frontier for relational learning. TDL may complement graph representation learning and geometric deep learning by incorporating topological concepts, and can thus provide a natural choice for various machine learning setting…
▽ More
Topological deep learning (TDL) is a rapidly evolving field that uses topological features to understand and design deep learning models. This paper posits that TDL is the new frontier for relational learning. TDL may complement graph representation learning and geometric deep learning by incorporating topological concepts, and can thus provide a natural choice for various machine learning settings. To this end, this paper discusses open problems in TDL, ranging from practical benefits to theoretical foundations. For each problem, it outlines potential solutions and future research opportunities. At the same time, this paper serves as an invitation to the scientific community to actively participate in TDL research to unlock the potential of this emerging field.
△ Less
Submitted 6 August, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
TopoX: A Suite of Python Packages for Machine Learning on Topological Domains
Authors:
Mustafa Hajij,
Mathilde Papillon,
Florian Frantzen,
Jens Agerberg,
Ibrahem AlJabea,
Rubén Ballester,
Claudio Battiloro,
Guillermo Bernárdez,
Tolga Birdal,
Aiden Brent,
Peter Chin,
Sergio Escalera,
Simone Fiorellino,
Odin Hoff Gardaa,
Gurusankar Gopalakrishnan,
Devendra Govil,
Josef Hoppe,
Maneel Reddy Karri,
Jude Khouja,
Manuel Lecha,
Neal Livesay,
Jan Meißner,
Soham Mukherjee,
Alexander Nikitin,
Theodore Papamarkou
, et al. (18 additional authors not shown)
Abstract:
We introduce TopoX, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. TopoX consists of three packages: TopoNetX facilitates constructing and computing on these domains, including working with nodes, edges and higher-order…
▽ More
We introduce TopoX, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. TopoX consists of three packages: TopoNetX facilitates constructing and computing on these domains, including working with nodes, edges and higher-order cells; TopoEmbedX provides methods to embed topological domains into vector spaces, akin to popular graph-based embedding algorithms such as node2vec; TopoModelX is built on top of PyTorch and offers a comprehensive toolbox of higher-order message passing functions for neural networks on topological domains. The extensively documented and unit-tested source code of TopoX is available under MIT license at https://pyt-team.github.io/}{https://pyt-team.github.io/.
△ Less
Submitted 8 December, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
Connecting the Dots: Is Mode-Connectedness the Key to Feasible Sample-Based Inference in Bayesian Neural Networks?
Authors:
Emanuel Sommer,
Lisa Wimmer,
Theodore Papamarkou,
Ludwig Bothmann,
Bernd Bischl,
David Rügamer
Abstract:
A major challenge in sample-based inference (SBI) for Bayesian neural networks is the size and structure of the networks' parameter space. Our work shows that successful SBI is possible by embracing the characteristic relationship between weight and function space, uncovering a systematic link between overparameterization and the difficulty of the sampling problem. Through extensive experiments, w…
▽ More
A major challenge in sample-based inference (SBI) for Bayesian neural networks is the size and structure of the networks' parameter space. Our work shows that successful SBI is possible by embracing the characteristic relationship between weight and function space, uncovering a systematic link between overparameterization and the difficulty of the sampling problem. Through extensive experiments, we establish practical guidelines for sampling and convergence diagnosis. As a result, we present a deep ensemble initialized approach as an effective solution with competitive performance and uncertainty quantification.
△ Less
Submitted 27 May, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI
Authors:
Theodore Papamarkou,
Maria Skoularidou,
Konstantina Palla,
Laurence Aitchison,
Julyan Arbel,
David Dunson,
Maurizio Filippone,
Vincent Fortuin,
Philipp Hennig,
José Miguel Hernández-Lobato,
Aliaksandr Hubin,
Alexander Immer,
Theofanis Karaletsos,
Mohammad Emtiyaz Khan,
Agustinus Kristiadi,
Yingzhen Li,
Stephan Mandt,
Christopher Nemeth,
Michael A. Osborne,
Tim G. J. Rudner,
David Rügamer,
Yee Whye Teh,
Max Welling,
Andrew Gordon Wilson,
Ruqi Zhang
Abstract:
In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learni…
▽ More
In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learning (BDL) constitutes a promising avenue, offering advantages across these diverse settings. This paper posits that BDL can elevate the capabilities of deep learning. It revisits the strengths of BDL, acknowledges existing challenges, and highlights some exciting research avenues aimed at addressing these obstacles. Looking ahead, the discussion focuses on possible ways to combine large-scale foundation models with BDL to unlock their full potential.
△ Less
Submitted 6 August, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Combinatorial Complexes: Bridging the Gap Between Cell Complexes and Hypergraphs
Authors:
Mustafa Hajij,
Ghada Zamzmi,
Theodore Papamarkou,
Aldo Guzmán-Sáenz,
Tolga Birdal,
Michael T. Schaub
Abstract:
Graph-based signal processing techniques have become essential for handling data in non-Euclidean spaces. However, there is a growing awareness that these graph models might need to be expanded into `higher-order' domains to effectively represent the complex relations found in high-dimensional data. Such higher-order domains are typically modeled either as hypergraphs, or as simplicial, cubical or…
▽ More
Graph-based signal processing techniques have become essential for handling data in non-Euclidean spaces. However, there is a growing awareness that these graph models might need to be expanded into `higher-order' domains to effectively represent the complex relations found in high-dimensional data. Such higher-order domains are typically modeled either as hypergraphs, or as simplicial, cubical or other cell complexes. In this context, cell complexes are often seen as a subclass of hypergraphs with additional algebraic structure that can be exploited, e.g., to develop a spectral theory. In this article, we promote an alternative perspective. We argue that hypergraphs and cell complexes emphasize \emph{different} types of relations, which may have different utility depending on the application context. Whereas hypergraphs are effective in modeling set-type, multi-body relations between entities, cell complexes provide an effective means to model hierarchical, interior-to-boundary type relations. We discuss the relative advantages of these two choices and elaborate on the previously introduced concept of a combinatorial complex that enables co-existing set-type and hierarchical relations. Finally, we provide a brief numerical experiment to demonstrate that this modelling flexibility can be advantageous in learning tasks.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
Model-agnostic variable importance for predictive uncertainty: an entropy-based approach
Authors:
Danny Wood,
Theodore Papamarkou,
Matt Benatan,
Richard Allmendinger
Abstract:
In order to trust the predictions of a machine learning algorithm, it is necessary to understand the factors that contribute to those predictions. In the case of probabilistic and uncertainty-aware models, it is necessary to understand not only the reasons for the predictions themselves, but also the reasons for the model's level of confidence in those predictions. In this paper, we show how exist…
▽ More
In order to trust the predictions of a machine learning algorithm, it is necessary to understand the factors that contribute to those predictions. In the case of probabilistic and uncertainty-aware models, it is necessary to understand not only the reasons for the predictions themselves, but also the reasons for the model's level of confidence in those predictions. In this paper, we show how existing methods in explainability can be extended to uncertainty-aware models and how such extensions can be used to understand the sources of uncertainty in a model's predictive distribution. In particular, by adapting permutation feature importance, partial dependence plots, and individual conditional expectation plots, we demonstrate that novel insights into model behaviour may be obtained and that these methods can be used to measure the impact of features on both the entropy of the predictive distribution and the log-likelihood of the ground truth labels under that distribution. With experiments using both synthetic and real-world data, we demonstrate the utility of these approaches to understand both the sources of uncertainty and their impact on model performance.
△ Less
Submitted 16 August, 2024; v1 submitted 19 October, 2023;
originally announced October 2023.
-
ICML 2023 Topological Deep Learning Challenge : Design and Results
Authors:
Mathilde Papillon,
Mustafa Hajij,
Helen Jenne,
Johan Mathe,
Audun Myers,
Theodore Papamarkou,
Tolga Birdal,
Tamal Dey,
Tim Doster,
Tegan Emerson,
Gurusankar Gopalakrishnan,
Devendra Govil,
Aldo Guzmán-Sáenz,
Henry Kvinge,
Neal Livesay,
Soham Mukherjee,
Shreyas N. Samaga,
Karthikeyan Natesan Ramamurthy,
Maneel Reddy Karri,
Paul Rosen,
Sophia Sanborn,
Robin Walters,
Jens Agerberg,
Sadrodin Barikbin,
Claudio Battiloro
, et al. (31 additional authors not shown)
Abstract:
This paper presents the computational challenge on topological deep learning that was hosted within the ICML 2023 Workshop on Topology and Geometry in Machine Learning. The competition asked participants to provide open-source implementations of topological neural networks from the literature by contributing to the python packages TopoNetX (data processing) and TopoModelX (deep learning). The chal…
▽ More
This paper presents the computational challenge on topological deep learning that was hosted within the ICML 2023 Workshop on Topology and Geometry in Machine Learning. The competition asked participants to provide open-source implementations of topological neural networks from the literature by contributing to the python packages TopoNetX (data processing) and TopoModelX (deep learning). The challenge attracted twenty-eight qualifying submissions in its two-month duration. This paper describes the design of the challenge and summarizes its main findings.
△ Less
Submitted 18 January, 2024; v1 submitted 26 September, 2023;
originally announced September 2023.
-
Towards Efficient MCMC Sampling in Bayesian Neural Networks by Exploiting Symmetry
Authors:
Jonas Gregor Wiese,
Lisa Wimmer,
Theodore Papamarkou,
Bernd Bischl,
Stephan Günnemann,
David Rügamer
Abstract:
Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. Local methods, which have emerged as a popular alternative, focus on specific parameter regions that…
▽ More
Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. Local methods, which have emerged as a popular alternative, focus on specific parameter regions that can be approximated by functions with tractable integrals. While these often yield satisfactory empirical results, they fail, by definition, to account for the multi-modality of the parameter posterior. In this work, we argue that the dilemma between exact-but-unaffordable and cheap-but-inexact approaches can be mitigated by exploiting symmetries in the posterior landscape. Such symmetries, induced by neuron interchangeability and certain activation functions, manifest in different parameter values leading to the same functional output value. We show theoretically that the posterior predictive density in Bayesian neural networks can be restricted to a symmetry-free parameter reference set. By further deriving an upper bound on the number of Monte Carlo chains required to capture the functional diversity, we propose a straightforward approach for feasible Bayesian inference. Our experiments suggest that efficient sampling is indeed possible, opening up a promising path to accurate uncertainty quantification in deep learning.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
Approximate blocked Gibbs sampling for Bayesian neural networks
Authors:
Theodore Papamarkou
Abstract:
In this work, minibatch MCMC sampling for feedforward neural networks is made more feasible. To this end, it is proposed to sample subgroups of parameters via a blocked Gibbs sampling scheme. By partitioning the parameter space, sampling is possible irrespective of layer width. It is also possible to alleviate vanishing acceptance rates for increasing depth by reducing the proposal variance in dee…
▽ More
In this work, minibatch MCMC sampling for feedforward neural networks is made more feasible. To this end, it is proposed to sample subgroups of parameters via a blocked Gibbs sampling scheme. By partitioning the parameter space, sampling is possible irrespective of layer width. It is also possible to alleviate vanishing acceptance rates for increasing depth by reducing the proposal variance in deeper layers. Increasing the length of a non-convergent chain increases the predictive accuracy in classification tasks, so avoiding vanishing acceptance rates and consequently enabling longer chain runs have practical benefits. Moreover, non-convergent chain realizations aid in the quantification of predictive uncertainty. An open problem is how to perform minibatch MCMC sampling for feedforward neural networks in the presence of augmented data.
△ Less
Submitted 24 July, 2023; v1 submitted 24 August, 2022;
originally announced August 2022.
-
Topological Deep Learning: Going Beyond Graph Data
Authors:
Mustafa Hajij,
Ghada Zamzmi,
Theodore Papamarkou,
Nina Miolane,
Aldo Guzmán-Sáenz,
Karthikeyan Natesan Ramamurthy,
Tolga Birdal,
Tamal K. Dey,
Soham Mukherjee,
Shreyas N. Samaga,
Neal Livesay,
Robin Walters,
Paul Rosen,
Michael T. Schaub
Abstract:
Topological deep learning is a rapidly growing field that pertains to the development of deep learning models for data supported on topological domains such as simplicial complexes, cell complexes, and hypergraphs, which generalize many domains encountered in scientific computations. In this paper, we present a unifying deep learning framework built upon a richer data structure that includes widel…
▽ More
Topological deep learning is a rapidly growing field that pertains to the development of deep learning models for data supported on topological domains such as simplicial complexes, cell complexes, and hypergraphs, which generalize many domains encountered in scientific computations. In this paper, we present a unifying deep learning framework built upon a richer data structure that includes widely adopted topological domains.
Specifically, we first introduce combinatorial complexes, a novel type of topological domain. Combinatorial complexes can be seen as generalizations of graphs that maintain certain desirable properties. Similar to hypergraphs, combinatorial complexes impose no constraints on the set of relations. In addition, combinatorial complexes permit the construction of hierarchical higher-order relations, analogous to those found in simplicial and cell complexes. Thus, combinatorial complexes generalize and combine useful traits of both hypergraphs and cell complexes, which have emerged as two promising abstractions that facilitate the generalization of graph neural networks to topological spaces.
Second, building upon combinatorial complexes and their rich combinatorial and algebraic structure, we develop a general class of message-passing combinatorial complex neural networks (CCNNs), focusing primarily on attention-based CCNNs. We characterize permutation and orientation equivariances of CCNNs, and discuss pooling and unpooling operations within CCNNs in detail.
Third, we evaluate the performance of CCNNs on tasks related to mesh shape analysis and graph learning. Our experiments demonstrate that CCNNs have competitive performance as compared to state-of-the-art deep learning models specifically tailored to the same tasks. Our findings demonstrate the advantages of incorporating higher-order relations into deep learning models in different applications.
△ Less
Submitted 19 May, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Probability-Generating Function Kernels for Spherical Data
Authors:
Theodore Papamarkou,
Alexey Lindo
Abstract:
Probability-generating function (PGF) kernels are introduced, which constitute a class of kernels supported on the unit hypersphere, for the purposes of spherical data analysis. PGF kernels generalize RBF kernels in the context of spherical data. The properties of PGF kernels are studied. A semi-parametric learning algorithm is introduced to enable the use of PGF kernels with spherical data.
Probability-generating function (PGF) kernels are introduced, which constitute a class of kernels supported on the unit hypersphere, for the purposes of spherical data analysis. PGF kernels generalize RBF kernels in the context of spherical data. The properties of PGF kernels are studied. A semi-parametric learning algorithm is introduced to enable the use of PGF kernels with spherical data.
△ Less
Submitted 1 February, 2024; v1 submitted 1 December, 2021;
originally announced December 2021.
-
A Random Persistence Diagram Generator
Authors:
Theodore Papamarkou,
Farzana Nasrin,
Austin Lawson,
Na Gong,
Orlando Rios,
Vasileios Maroulas
Abstract:
Topological data analysis (TDA) studies the shape patterns of data. Persistent homology is a widely used method in TDA that summarizes homological features of data at multiple scales and stores them in persistence diagrams (PDs). In this paper, we propose a random persistence diagram generator (RPDG) method that generates a sequence of random PDs from the ones produced by the data. RPDG is underpi…
▽ More
Topological data analysis (TDA) studies the shape patterns of data. Persistent homology is a widely used method in TDA that summarizes homological features of data at multiple scales and stores them in persistence diagrams (PDs). In this paper, we propose a random persistence diagram generator (RPDG) method that generates a sequence of random PDs from the ones produced by the data. RPDG is underpinned by a model based on pairwise interacting point processes, and a reversible jump Markov chain Monte Carlo (RJ-MCMC) algorithm. A first example, which is based on a synthetic dataset, demonstrates the efficacy of RPDG and provides a comparison with another method for sampling PDs. A second example demonstrates the utility of RPDG to solve a materials science problem given a real dataset of small sample size.
△ Less
Submitted 14 September, 2022; v1 submitted 15 April, 2021;
originally announced April 2021.
-
Simplicial Complex Representation Learning
Authors:
Mustafa Hajij,
Ghada Zamzmi,
Theodore Papamarkou,
Vasileios Maroulas,
Xuanting Cai
Abstract:
Simplicial complexes form an important class of topological spaces that are frequently used in many application areas such as computer-aided design, computer graphics, and simulation. Representation learning on graphs, which are just 1-d simplicial complexes, has witnessed a great attention in recent years. However, there has not been enough effort to extend representation learning to higher dimen…
▽ More
Simplicial complexes form an important class of topological spaces that are frequently used in many application areas such as computer-aided design, computer graphics, and simulation. Representation learning on graphs, which are just 1-d simplicial complexes, has witnessed a great attention in recent years. However, there has not been enough effort to extend representation learning to higher dimensional simplicial objects due to the additional complexity these objects hold, especially when it comes to entire-simplicial complex representation learning. In this work, we propose a method for simplicial complex-level representation learning that embeds a simplicial complex to a universal embedding space in a way that complex-to-complex proximity is preserved. Our method uses our novel geometric message passing schemes to learn an entire simplicial complex representation in an end-to-end fashion. We demonstrate the proposed model on publicly available mesh dataset. To the best of our knowledge, this work presents the first method for learning simplicial complex-level representation.
△ Less
Submitted 1 February, 2022; v1 submitted 6 March, 2021;
originally announced March 2021.
-
Bayesian neural networks and dimensionality reduction
Authors:
Deborshee Sen,
Theodore Papamarkou,
David Dunson
Abstract:
In conducting non-linear dimensionality reduction and feature learning, it is common to suppose that the data lie near a lower-dimensional manifold. A class of model-based approaches for such problems includes latent variables in an unknown non-linear regression function; this includes Gaussian process latent variable models and variational auto-encoders (VAEs) as special cases. VAEs are artificia…
▽ More
In conducting non-linear dimensionality reduction and feature learning, it is common to suppose that the data lie near a lower-dimensional manifold. A class of model-based approaches for such problems includes latent variables in an unknown non-linear regression function; this includes Gaussian process latent variable models and variational auto-encoders (VAEs) as special cases. VAEs are artificial neural networks (ANNs) that employ approximations to make computation tractable; however, current implementations lack adequate uncertainty quantification in estimating the parameters, predictive densities, and lower-dimensional subspace, and can be unstable and lack interpretability in practice. We attempt to solve these problems by deploying Markov chain Monte Carlo sampling algorithms (MCMC) for Bayesian inference in ANN models with latent variables. We address issues of identifiability by imposing constraints on the ANN parameters as well as by using anchor points. This is demonstrated on simulated and real data examples. We find that current MCMC sampling schemes face fundamental challenges in neural networks involving latent variables, motivating new research directions.
△ Less
Submitted 19 August, 2020; v1 submitted 18 August, 2020;
originally announced August 2020.
-
Hidden Markov models as recurrent neural networks: an application to Alzheimer's disease
Authors:
Matt Baucum,
Anahita Khojandi,
Theodore Papamarkou
Abstract:
Hidden Markov models (HMMs) are commonly used for disease progression modeling when the true patient health state is not fully known. Since HMMs typically have multiple local optima, incorporating additional patient covariates can improve parameter estimation and predictive performance. To allow for this, we develop hidden Markov recurrent neural networks (HMRNNs), a special case of recurrent neur…
▽ More
Hidden Markov models (HMMs) are commonly used for disease progression modeling when the true patient health state is not fully known. Since HMMs typically have multiple local optima, incorporating additional patient covariates can improve parameter estimation and predictive performance. To allow for this, we develop hidden Markov recurrent neural networks (HMRNNs), a special case of recurrent neural networks that combine neural networks' flexibility with HMMs' interpretability. The HMRNN can be reduced to a standard HMM, with an identical likelihood function and parameter interpretations, but it can also combine an HMM with other predictive neural networks that take patient information as input. The HMRNN estimates all parameters simultaneously via gradient descent. Using a dataset of Alzheimer's disease patients, we demonstrate how the HMRNN can combine an HMM with other predictive neural networks to improve disease forecasting and to offer a novel clinical interpretation compared with a standard HMM trained via expectation-maximization.
△ Less
Submitted 1 October, 2021; v1 submitted 4 June, 2020;
originally announced June 2020.
-
Depth-2 Neural Networks Under a Data-Poisoning Attack
Authors:
Sayar Karmakar,
Anirbit Mukherjee,
Theodore Papamarkou
Abstract:
In this work, we study the possibility of defending against data-poisoning attacks while training a shallow neural network in a regression setup. We focus on doing supervised learning for a class of depth-2 finite-width neural networks, which includes single-filter convolutional networks. In this class of networks, we attempt to learn the network weights in the presence of a malicious oracle doing…
▽ More
In this work, we study the possibility of defending against data-poisoning attacks while training a shallow neural network in a regression setup. We focus on doing supervised learning for a class of depth-2 finite-width neural networks, which includes single-filter convolutional networks. In this class of networks, we attempt to learn the network weights in the presence of a malicious oracle doing stochastic, bounded and additive adversarial distortions on the true output during training. For the non-gradient stochastic algorithm that we construct, we prove worst-case near-optimal trade-offs among the magnitude of the adversarial attack, the weight approximation accuracy, and the confidence achieved by the proposed algorithm. As our algorithm uses mini-batching, we analyze how the mini-batch size affects convergence. We also show how to utilize the scaling of the outer layer weights to counter output-poisoning attacks depending on the probability of attack. Lastly, we give experimental evidence demonstrating how our algorithm outperforms stochastic gradient descent under different input data distributions, including instances of heavy-tailed distributions.
△ Less
Submitted 29 June, 2022; v1 submitted 4 May, 2020;
originally announced May 2020.
-
Automated detection of corrosion in used nuclear fuel dry storage canisters using residual neural networks
Authors:
Theodore Papamarkou,
Hayley Guy,
Bryce Kroencke,
Jordan Miller,
Preston Robinette,
Daniel Schultz,
Jacob Hinkle,
Laura Pullum,
Catherine Schuman,
Jeremy Renshaw,
Stylianos Chatzidakis
Abstract:
Nondestructive evaluation methods play an important role in ensuring component integrity and safety in many industries. Operator fatigue can play a critical role in the reliability of such methods. This is important for inspecting high value assets or assets with a high consequence of failure, such as aerospace and nuclear components. Recent advances in convolution neural networks can support and…
▽ More
Nondestructive evaluation methods play an important role in ensuring component integrity and safety in many industries. Operator fatigue can play a critical role in the reliability of such methods. This is important for inspecting high value assets or assets with a high consequence of failure, such as aerospace and nuclear components. Recent advances in convolution neural networks can support and automate these inspection efforts. This paper proposes using residual neural networks (ResNets) for real-time detection of corrosion, including iron oxide discoloration, pitting and stress corrosion cracking, in dry storage stainless steel canisters housing used nuclear fuel. The proposed approach crops nuclear canister images into smaller tiles, trains a ResNet on these tiles, and classifies images as corroded or intact using the per-image count of tiles predicted as corroded by the ResNet. The results demonstrate that such a deep learning approach allows to detect the locus of corrosion via smaller tiles, and at the same time to infer with high accuracy whether an image comes from a corroded canister. Thereby, the proposed approach holds promise to automate and speed up nuclear fuel canister inspections, to minimize inspection costs, and to partially replace human-conducted onsite inspections, thus reducing radiation doses to personnel.
△ Less
Submitted 13 July, 2020; v1 submitted 6 March, 2020;
originally announced March 2020.
-
Wide Neural Networks with Bottlenecks are Deep Gaussian Processes
Authors:
Devanshu Agrawal,
Theodore Papamarkou,
Jacob Hinkle
Abstract:
There has recently been much work on the "wide limit" of neural networks, where Bayesian neural networks (BNNs) are shown to converge to a Gaussian process (GP) as all hidden layers are sent to infinite width. However, these results do not apply to architectures that require one or more of the hidden layers to remain narrow. In this paper, we consider the wide limit of BNNs where some hidden layer…
▽ More
There has recently been much work on the "wide limit" of neural networks, where Bayesian neural networks (BNNs) are shown to converge to a Gaussian process (GP) as all hidden layers are sent to infinite width. However, these results do not apply to architectures that require one or more of the hidden layers to remain narrow. In this paper, we consider the wide limit of BNNs where some hidden layers, called "bottlenecks", are held at finite width. The result is a composition of GPs that we term a "bottleneck neural network Gaussian process" (bottleneck NNGP). Although intuitive, the subtlety of the proof is in showing that the wide limit of a composition of networks is in fact the composition of the limiting GPs. We also analyze theoretically a single-bottleneck NNGP, finding that the bottleneck induces dependence between the outputs of a multi-output network that persists through extreme post-bottleneck depths, and prevents the kernel of the network from losing discriminative power at extreme post-bottleneck depths.
△ Less
Submitted 6 July, 2020; v1 submitted 3 January, 2020;
originally announced January 2020.
-
Challenges in Markov chain Monte Carlo for Bayesian neural networks
Authors:
Theodore Papamarkou,
Jacob Hinkle,
M. Todd Young,
David Womble
Abstract:
Markov chain Monte Carlo (MCMC) methods have not been broadly adopted in Bayesian neural networks (BNNs). This paper initially reviews the main challenges in sampling from the parameter posterior of a neural network via MCMC. Such challenges culminate to lack of convergence to the parameter posterior. Nevertheless, this paper shows that a non-converged Markov chain, generated via MCMC sampling fro…
▽ More
Markov chain Monte Carlo (MCMC) methods have not been broadly adopted in Bayesian neural networks (BNNs). This paper initially reviews the main challenges in sampling from the parameter posterior of a neural network via MCMC. Such challenges culminate to lack of convergence to the parameter posterior. Nevertheless, this paper shows that a non-converged Markov chain, generated via MCMC sampling from the parameter space of a neural network, can yield via Bayesian marginalization a valuable posterior predictive distribution of the output of the neural network. Classification examples based on multilayer perceptrons showcase highly accurate posterior predictive distributions. The postulate of limited scope for MCMC developments in BNNs is partially valid; an asymptotically exact parameter posterior seems less plausible, yet an accurate posterior predictive distribution is a tenable research avenue.
△ Less
Submitted 1 October, 2021; v1 submitted 15 October, 2019;
originally announced October 2019.
-
Distributions.jl: Definition and Modeling of Probability Distributions in the JuliaStats Ecosystem
Authors:
Mathieu Besançon,
Theodore Papamarkou,
David Anthoff,
Alex Arslan,
Simon Byrne,
Dahua Lin,
John Pearson
Abstract:
Random variables and their distributions are a central part in many areas of statistical methods. The Distributions.jl package provides Julia users and developers tools for working with probability distributions, leveraging Julia features for their intuitive and flexible manipulation, while remaining highly efficient through zero-cost abstractions.
Random variables and their distributions are a central part in many areas of statistical methods. The Distributions.jl package provides Julia users and developers tools for working with probability distributions, leveraging Julia features for their intuitive and flexible manipulation, while remaining highly efficient through zero-cost abstractions.
△ Less
Submitted 12 July, 2021; v1 submitted 19 July, 2019;
originally announced July 2019.
-
Forward-Mode Automatic Differentiation in Julia
Authors:
Jarrett Revels,
Miles Lubin,
Theodore Papamarkou
Abstract:
We present ForwardDiff, a Julia package for forward-mode automatic differentiation (AD) featuring performance competitive with low-level languages like C++. Unlike recently developed AD tools in other popular high-level languages such as Python and MATLAB, ForwardDiff takes advantage of just-in-time (JIT) compilation to transparently recompile AD-unaware user code, enabling efficient support for h…
▽ More
We present ForwardDiff, a Julia package for forward-mode automatic differentiation (AD) featuring performance competitive with low-level languages like C++. Unlike recently developed AD tools in other popular high-level languages such as Python and MATLAB, ForwardDiff takes advantage of just-in-time (JIT) compilation to transparently recompile AD-unaware user code, enabling efficient support for higher-order differentiation and differentiation using custom number types (including complex numbers). For gradient and Jacobian calculations, ForwardDiff provides a variant of vector-forward mode that avoids expensive heap allocation and makes better use of memory bandwidth than traditional vector mode. In our numerical experiments, we demonstrate that for nontrivially large dimensions, ForwardDiff's gradient computations can be faster than a reverse-mode implementation from the Python-based autograd package. We also illustrate how ForwardDiff is used effectively within JuMP, a modeling language for optimization. According to our usage statistics, 41 unique repositories on GitHub depend on ForwardDiff, with users from diverse fields such as astronomy, optimization, finite element analysis, and statistics.
This document is an extended abstract that has been accepted for presentation at the AD2016 7th International Conference on Algorithmic Differentiation.
△ Less
Submitted 26 July, 2016;
originally announced July 2016.