Search | arXiv e-print repository

Data-driven approaches to inverse problems

Authors: Carola-Bibiane Schönlieb, Zakhar Shumaylov

Abstract: Inverse problems are concerned with the reconstruction of unknown physical quantities using indirect measurements and are fundamental across diverse fields such as medical imaging, remote sensing, and material sciences. These problems serve as critical tools for visualizing internal structures beyond what is visible to the naked eye, enabling quantification, diagnosis, prediction, and discovery. H… ▽ More Inverse problems are concerned with the reconstruction of unknown physical quantities using indirect measurements and are fundamental across diverse fields such as medical imaging, remote sensing, and material sciences. These problems serve as critical tools for visualizing internal structures beyond what is visible to the naked eye, enabling quantification, diagnosis, prediction, and discovery. However, most inverse problems are ill-posed, necessitating robust mathematical treatment to yield meaningful solutions. While classical approaches provide mathematically rigorous and computationally stable solutions, they are constrained by the ability to accurately model solution properties and implement them efficiently. A more recent paradigm considers deriving solutions to inverse problems in a data-driven manner. Instead of relying on classical mathematical modeling, this approach utilizes highly over-parameterized models, typically deep neural networks, which are adapted to specific inverse problems using carefully selected training data. Current approaches that follow this new paradigm distinguish themselves through solution accuracy paired with computational efficiency that was previously inconceivable. These notes offer an introduction to this data-driven paradigm for inverse problems. The first part of these notes will provide an introduction to inverse problems, discuss classical solution strategies, and present some applications. The second part will delve into modern data-driven approaches, with a particular focus on adversarial regularization and provably convergent linear plug-and-play denoisers. Throughout the presentation of these methodologies, their theoretical properties will be discussed, and numerical examples will be provided. The lecture series will conclude with a discussion of open problems and future perspectives in the field. △ Less

Submitted 13 June, 2025; originally announced June 2025.

Comments: Notes from Machine Learning: From Data to Mathematical Understanding (CIME 2023)

arXiv:2505.19873 [pdf, ps, other]

Deep Spectral Prior

Authors: Yanqi Cheng, Tieyong Zeng, Pietro Lio, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

Abstract: We introduce Deep Spectral Prior (DSP), a new formulation of Deep Image Prior (DIP) that redefines image reconstruction as a frequency-domain alignment problem. Unlike traditional DIP, which relies on pixel-wise loss and early stopping to mitigate overfitting, DSP directly matches Fourier coefficients between the network output and observed measurements. This shift introduces an explicit inductive… ▽ More We introduce Deep Spectral Prior (DSP), a new formulation of Deep Image Prior (DIP) that redefines image reconstruction as a frequency-domain alignment problem. Unlike traditional DIP, which relies on pixel-wise loss and early stopping to mitigate overfitting, DSP directly matches Fourier coefficients between the network output and observed measurements. This shift introduces an explicit inductive bias towards spectral coherence, aligning with the known frequency structure of images and the spectral bias of convolutional neural networks. We provide a rigorous theoretical framework demonstrating that DSP acts as an implicit spectral regulariser, suppressing high-frequency noise by design and eliminating the need for early stopping. Our analysis spans four core dimensions establishing smooth convergence dynamics, local stability, and favourable bias-variance tradeoffs. We further show that DSP naturally projects reconstructions onto a frequency-consistent manifold, enhancing interpretability and robustness. These theoretical guarantees are supported by empirical results across denoising, inpainting, and super-resolution tasks, where DSP consistently outperforms classical DIP and other unsupervised baselines. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2505.12940 [pdf, other]

Multi-Level Monte Carlo Training of Neural Operators

Authors: James Rowbottom, Stefania Fresca, Pietro Lio, Carola-Bibiane Schönlieb, Nicolas Boullé

Abstract: Operator learning is a rapidly growing field that aims to approximate nonlinear operators related to partial differential equations (PDEs) using neural operators. These rely on discretization of input and output functions and are, usually, expensive to train for large-scale problems at high-resolution. Motivated by this, we present a Multi-Level Monte Carlo (MLMC) approach to train neural operator… ▽ More Operator learning is a rapidly growing field that aims to approximate nonlinear operators related to partial differential equations (PDEs) using neural operators. These rely on discretization of input and output functions and are, usually, expensive to train for large-scale problems at high-resolution. Motivated by this, we present a Multi-Level Monte Carlo (MLMC) approach to train neural operators by leveraging a hierarchy of resolutions of function dicretization. Our framework relies on using gradient corrections from fewer samples of fine-resolution data to decrease the computational cost of training while maintaining a high level accuracy. The proposed MLMC training procedure can be applied to any architecture accepting multi-resolution data. Our numerical experiments on a range of state-of-the-art models and test-cases demonstrate improved computational efficiency compared to traditional single-resolution training approaches, and highlight the existence of a Pareto curve between accuracy and computational time, related to the number of samples per resolution. △ Less

Submitted 19 May, 2025; originally announced May 2025.

Comments: 18 pages, 8 figures

arXiv:2505.12003 [pdf, ps, other]

Approximation theory for 1-Lipschitz ResNets

Authors: Davide Murari, Takashi Furuya, Carola-Bibiane Schönlieb

Abstract: 1-Lipschitz neural networks are fundamental for generative modelling, inverse problems, and robust classifiers. In this paper, we focus on 1-Lipschitz residual networks (ResNets) based on explicit Euler steps of negative gradient flows and study their approximation capabilities. Leveraging the Restricted Stone-Weierstrass Theorem, we first show that these 1-Lipschitz ResNets are dense in the set o… ▽ More 1-Lipschitz neural networks are fundamental for generative modelling, inverse problems, and robust classifiers. In this paper, we focus on 1-Lipschitz residual networks (ResNets) based on explicit Euler steps of negative gradient flows and study their approximation capabilities. Leveraging the Restricted Stone-Weierstrass Theorem, we first show that these 1-Lipschitz ResNets are dense in the set of scalar 1-Lipschitz functions on any compact domain when width and depth are allowed to grow. We also show that these networks can exactly represent scalar piecewise affine 1-Lipschitz functions. We then prove a stronger statement: by inserting norm-constrained linear maps between the residual blocks, the same density holds when the hidden width is fixed. Because every layer obeys simple norm constraints, the resulting models can be trained with off-the-shelf optimisers. This paper provides the first universal approximation guarantees for 1-Lipschitz ResNets, laying a rigorous foundation for their practical use. △ Less

Submitted 17 May, 2025; originally announced May 2025.

MSC Class: 68T07

arXiv:2505.01388 [pdf, ps, other]

Potential Contrast: Properties, Equivalences, and Generalization to Multiple Classes

Authors: Wallace Peaslee, Anna Breger, Carola-Bibiane Schönlieb

Abstract: Potential contrast is typically used as an image quality measure and quantifies the maximal possible contrast between samples from two classes of pixels in an image after an arbitrary grayscale transformation. It has been valuable in cultural heritage applications, identifying and visualizing relevant information in multispectral images while requiring a small number of pixels to be manually sampl… ▽ More Potential contrast is typically used as an image quality measure and quantifies the maximal possible contrast between samples from two classes of pixels in an image after an arbitrary grayscale transformation. It has been valuable in cultural heritage applications, identifying and visualizing relevant information in multispectral images while requiring a small number of pixels to be manually sampled. In this work, we introduce a normalized version of potential contrast that removes dependence on image format and also prove equalities that enable generalization to more than two classes and to continuous settings. Finally, we exemplify the utility of multi-class normalized potential contrast through an application to a medieval music manuscript with visible bleedthrough from the back page. We share our implementations, based on both original algorithms and our new equalities, including generalization to multiple classes, at https://github.com/wallacepeaslee/Multiple-Class-Normalized-Potential-Contrast. △ Less

Submitted 2 May, 2025; originally announced May 2025.

arXiv:2504.04997 [pdf, other]

SurvSurf: a partially monotonic neural network for first-hitting time prediction of intermittently observed discrete and continuous sequential events

Authors: Yichen Kelly Chen, Sören Dittmer, Kinga Bernatowicz, Josep Arús-Pous, Kamen Bliznashki, John Aston, James H. F. Rudd, Carola-Bibiane Schönlieb, James Jones, Michael Roberts

Abstract: We propose a neural-network based survival model (SurvSurf) specifically designed for direct and simultaneous probabilistic prediction of the first hitting time of sequential events from baseline. Unlike existing models, SurvSurf is theoretically guaranteed to never violate the monotonic relationship between the cumulative incidence functions of sequential events, while allowing nonlinear influenc… ▽ More We propose a neural-network based survival model (SurvSurf) specifically designed for direct and simultaneous probabilistic prediction of the first hitting time of sequential events from baseline. Unlike existing models, SurvSurf is theoretically guaranteed to never violate the monotonic relationship between the cumulative incidence functions of sequential events, while allowing nonlinear influence from predictors. It also incorporates implicit truths for unobserved intermediate events in model fitting, and supports both discrete and continuous time and events. We also identified a variant of the Integrated Brier Score (IBS) that showed robust correlation with the mean squared error (MSE) between the true and predicted probabilities by accounting for implied truths about the missing intermediate events. We demonstrated the superiority of SurvSurf compared to modern and traditional predictive survival models in two simulated datasets and two real-world datasets, using MSE, the more robust IBS and by measuring the extent of monotonicity violation. △ Less

Submitted 7 April, 2025; originally announced April 2025.

Comments: 41 pages, 18 figures (including supplemental information). Submitted to RSS: Data Science and Artificial Intelligence

MSC Class: 62N01

arXiv:2412.10249 [pdf, other]

Stochastic Multiresolution Image Sketching for Inverse Imaging Problems

Authors: Alessandro Perelli, Carola-Bibiane Schonlieb, Matthias J. Ehrhardt

Abstract: A challenge in high-dimensional inverse problems is developing iterative solvers to find the accurate solution of regularized optimization problems with low computational cost. An important example is computed tomography (CT) where both image and data sizes are large and therefore the forward model is costly to evaluate. Since several years algorithms from stochastic optimization are used for tomo… ▽ More A challenge in high-dimensional inverse problems is developing iterative solvers to find the accurate solution of regularized optimization problems with low computational cost. An important example is computed tomography (CT) where both image and data sizes are large and therefore the forward model is costly to evaluate. Since several years algorithms from stochastic optimization are used for tomographic image reconstruction with great success by subsampling the data. Here we propose a novel way how stochastic optimization can be used to speed up image reconstruction by means of image domain sketching such that at each iteration an image of different resolution is being used. Hence, we coin this algorithm ImaSk. By considering an associated saddle-point problem, we can formulate ImaSk as a gradient-based algorithm where the gradient is approximated in the same spirit as the stochastic average gradient amélioré (SAGA) and uses at each iteration one of these multiresolution operators at random. We prove that ImaSk is linearly converging for linear forward models with strongly convex regularization functions. Numerical simulations on CT show that ImaSk is effective and increasing the number of multiresolution operators reduces the computational time to reach the modeled solution. △ Less

Submitted 13 December, 2024; originally announced December 2024.

Comments: 26 pages, 11 figures, submitted to SIAM Journal on Imaging Sciences

MSC Class: 49M29; 65K05; 65F22 ACM Class: G.1.6; G.1.3

arXiv:2412.10129 [pdf, other]

TIGRE v3: Efficient and easy to use iterative computed tomographic reconstruction toolbox for real datasets

Authors: Ander Biguri, Tomoyuki Sadakane, Reuben Lindroos, Yi Liu, Malena Sabaté Landman, Yi Du, Manasavee Lohvithee, Stefanie Kaser, Sepideh Hatamikia, Robert Bryll, Emilien Valat, Sarinrat Wonglee, Thomas Blumensath, Carola-Bibiane Schönlieb

Abstract: Computed Tomography (CT) has been widely adopted in medicine and it is increasingly being used in scientific and industrial applications. Parallelly, research in different mathematical areas concerning discrete inverse problems has led to the development of new sophisticated numerical solvers that can be applied in the context of CT. The Tomographic Iterative GPU-based Reconstruction (TIGRE) toolb… ▽ More Computed Tomography (CT) has been widely adopted in medicine and it is increasingly being used in scientific and industrial applications. Parallelly, research in different mathematical areas concerning discrete inverse problems has led to the development of new sophisticated numerical solvers that can be applied in the context of CT. The Tomographic Iterative GPU-based Reconstruction (TIGRE) toolbox was born almost a decade ago precisely in the gap between mathematics and high performance computing for real CT data, providing user-friendly open-source software tools for image reconstruction. However, since its inception, the tools' features and codebase have had over a twenty-fold increase, and are now including greater geometric flexibility, a variety of modern algorithms for image reconstruction, high-performance computing features and support for other CT modalities, like proton CT. The purpose of this work is two-fold: first, it provides a structured overview of the current version of the TIGRE toolbox, providing appropriate descriptions and references, and serving as a comprehensive and peer-reviewed guide for the user; second, it is an opportunity to illustrate the performance of several of the available solvers showcasing real CT acquisitions, which are typically not be openly available to algorithm developers. △ Less

Submitted 13 December, 2024; originally announced December 2024.

arXiv:2410.18262 [pdf, other]

Hamiltonian Matching for Symplectic Neural Integrators

Authors: Priscilla Canizares, Davide Murari, Carola-Bibiane Schönlieb, Ferdia Sherry, Zakhar Shumaylov

Abstract: Hamilton's equations of motion form a fundamental framework in various branches of physics, including astronomy, quantum mechanics, particle physics, and climate science. Classical numerical solvers are typically employed to compute the time evolution of these systems. However, when the system spans multiple spatial and temporal scales numerical errors can accumulate, leading to reduced accuracy.… ▽ More Hamilton's equations of motion form a fundamental framework in various branches of physics, including astronomy, quantum mechanics, particle physics, and climate science. Classical numerical solvers are typically employed to compute the time evolution of these systems. However, when the system spans multiple spatial and temporal scales numerical errors can accumulate, leading to reduced accuracy. To address the challenges of evolving such systems over long timescales, we propose SympFlow, a novel neural network-based symplectic integrator, which is the composition of a sequence of exact flow maps of parametrised time-dependent Hamiltonian functions. This architecture allows for a backward error analysis: we can identify an underlying Hamiltonian function of the architecture and use it to define a Hamiltonian matching objective function, which we use for training. In numerical experiments, we show that SympFlow exhibits promising results, with qualitative energy conservation behaviour similar to that of time-stepping symplectic integrators. △ Less

Submitted 23 October, 2024; originally announced October 2024.

Comments: NeurReps 2024

arXiv:2410.04543 [pdf, ps, other]

Pullback Flow Matching on Data Manifolds

Authors: Friso de Kruiff, Erik Bekkers, Ozan Öktem, Carola-Bibiane Schönlieb, Willem Diepeveen

Abstract: We propose Pullback Flow Matching (PFM), a novel framework for generative modeling on data manifolds. Unlike existing methods that assume or learn restrictive closed-form manifold mappings for training Riemannian Flow Matching (RFM) models, PFM leverages pullback geometry and isometric learning to preserve the underlying manifold's geometry while enabling efficient generation and precise interpola… ▽ More We propose Pullback Flow Matching (PFM), a novel framework for generative modeling on data manifolds. Unlike existing methods that assume or learn restrictive closed-form manifold mappings for training Riemannian Flow Matching (RFM) models, PFM leverages pullback geometry and isometric learning to preserve the underlying manifold's geometry while enabling efficient generation and precise interpolation in latent space. This approach not only facilitates closed-form mappings on the data manifold but also allows for designable latent spaces, using assumed metrics on both data and latent manifolds. By enhancing isometric learning through Neural ODEs and proposing a scalable training objective, we achieve a latent space more suitable for interpolation, leading to improved manifold learning and generative performance. We demonstrate PFM's effectiveness through applications in synthetic data, protein dynamics and protein sequence data, generating novel proteins with specific properties. This method shows strong potential for drug discovery and materials science, where generating novel samples with specific properties is of great interest. △ Less

Submitted 9 July, 2025; v1 submitted 6 October, 2024; originally announced October 2024.

arXiv:2410.02698 [pdf, other]

Lie Algebra Canonicalization: Equivariant Neural Operators under arbitrary Lie Groups

Authors: Zakhar Shumaylov, Peter Zaika, James Rowbottom, Ferdia Sherry, Melanie Weber, Carola-Bibiane Schönlieb

Abstract: The quest for robust and generalizable machine learning models has driven recent interest in exploiting symmetries through equivariant neural networks. In the context of PDE solvers, recent works have shown that Lie point symmetries can be a useful inductive bias for Physics-Informed Neural Networks (PINNs) through data and loss augmentation. Despite this, directly enforcing equivariance within th… ▽ More The quest for robust and generalizable machine learning models has driven recent interest in exploiting symmetries through equivariant neural networks. In the context of PDE solvers, recent works have shown that Lie point symmetries can be a useful inductive bias for Physics-Informed Neural Networks (PINNs) through data and loss augmentation. Despite this, directly enforcing equivariance within the model architecture for these problems remains elusive. This is because many PDEs admit non-compact symmetry groups, oftentimes not studied beyond their infinitesimal generators, making them incompatible with most existing equivariant architectures. In this work, we propose Lie aLgebrA Canonicalization (LieLAC), a novel approach that exploits only the action of infinitesimal generators of the symmetry group, circumventing the need for knowledge of the full group structure. To achieve this, we address existing theoretical issues in the canonicalization literature, establishing connections with frame averaging in the case of continuous non-compact groups. Operating within the framework of canonicalization, LieLAC can easily be integrated with unconstrained pre-trained models, transforming inputs to a canonical form before feeding them into the existing model, effectively aligning the input for model inference according to allowed symmetries. LieLAC utilizes standard Lie group descent schemes, achieving equivariance in pre-trained models. Finally, we showcase LieLAC's efficacy on tasks of invariant image classification and Lie point symmetry equivariant neural PDE solvers using pre-trained models. △ Less

Submitted 4 March, 2025; v1 submitted 3 October, 2024; originally announced October 2024.

Comments: 44 pages; accepted at ICLR 2025

arXiv:2410.02113 [pdf, other]

Mamba Neural Operator: Who Wins? Transformers vs. State-Space Models for PDEs

Authors: Chun-Wun Cheng, Jiahao Huang, Yi Zhang, Guang Yang, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

Abstract: Partial differential equations (PDEs) are widely used to model complex physical systems, but solving them efficiently remains a significant challenge. Recently, Transformers have emerged as the preferred architecture for PDEs due to their ability to capture intricate dependencies. However, they struggle with representing continuous dynamics and long-range interactions. To overcome these limitation… ▽ More Partial differential equations (PDEs) are widely used to model complex physical systems, but solving them efficiently remains a significant challenge. Recently, Transformers have emerged as the preferred architecture for PDEs due to their ability to capture intricate dependencies. However, they struggle with representing continuous dynamics and long-range interactions. To overcome these limitations, we introduce the Mamba Neural Operator (MNO), a novel framework that enhances neural operator-based techniques for solving PDEs. MNO establishes a formal theoretical connection between structured state-space models (SSMs) and neural operators, offering a unified structure that can adapt to diverse architectures, including Transformer-based models. By leveraging the structured design of SSMs, MNO captures long-range dependencies and continuous dynamics more effectively than traditional Transformers. Through extensive analysis, we show that MNO significantly boosts the expressive power and accuracy of neural operators, making it not just a complement but a superior framework for PDE-related tasks, bridging the gap between efficient representation and accurate solution approximation. △ Less

Submitted 9 April, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

arXiv:2410.01950 [pdf, other]

Score-based Pullback Riemannian Geometry: Extracting the Data Manifold Geometry using Anisotropic Flows

Authors: Willem Diepeveen, Georgios Batzolis, Zakhar Shumaylov, Carola-Bibiane Schönlieb

Abstract: Data-driven Riemannian geometry has emerged as a powerful tool for interpretable representation learning, offering improved efficiency in downstream tasks. Moving forward, it is crucial to balance cheap manifold mappings with efficient training algorithms. In this work, we integrate concepts from pullback Riemannian geometry and generative models to propose a framework for data-driven Riemannian g… ▽ More Data-driven Riemannian geometry has emerged as a powerful tool for interpretable representation learning, offering improved efficiency in downstream tasks. Moving forward, it is crucial to balance cheap manifold mappings with efficient training algorithms. In this work, we integrate concepts from pullback Riemannian geometry and generative models to propose a framework for data-driven Riemannian geometry that is scalable in both geometry and learning: score-based pullback Riemannian geometry. Focusing on unimodal distributions as a first step, we propose a score-based Riemannian structure with closed-form geodesics that pass through the data probability density. With this structure, we construct a Riemannian autoencoder (RAE) with error bounds for discovering the correct data manifold dimension. This framework can naturally be used with anisotropic normalizing flows by adopting isometry regularization during training. Through numerical experiments on diverse datasets, including image data, we demonstrate that the proposed framework produces high-quality geodesics passing through the data support, reliably estimates the intrinsic dimension of the data manifold, and provides a global chart of the manifold. To the best of our knowledge, this is the first scalable framework for extracting the complete geometry of the data manifold. △ Less

Submitted 22 May, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

arXiv:2409.01097 [pdf, other]

Nested Bregman Iterations for Decomposition Problems

Authors: Tobias Wolf, Derek Driggs, Kostas Papafitsoros, Elena Resmerita, Carola-Bibiane Schönlieb

Abstract: We consider the task of image reconstruction while simultaneously decomposing the reconstructed image into components with different features. A commonly used tool for this is a variational approach with an infimal convolution of appropriate functions as a regularizer. Especially for noise corrupted observations, incorporating these functionals into the classical method of Bregman iterations provi… ▽ More We consider the task of image reconstruction while simultaneously decomposing the reconstructed image into components with different features. A commonly used tool for this is a variational approach with an infimal convolution of appropriate functions as a regularizer. Especially for noise corrupted observations, incorporating these functionals into the classical method of Bregman iterations provides a robust method for obtaining an overall good approximation of the true image, by stopping early the iteration according to a discrepancy principle. However, crucially, the quality of the separate components depends further on the proper choice of the regularization weights associated to the infimally convoluted functionals. Here, we propose the method of Nested Bregman iterations to improve a decomposition in a structured way. This allows to transform the task of choosing the weights into the problem of stopping the iteration according to a meaningful criterion based on normalized cross-correlation. We discuss the well-definedness and the convergence behavior of the proposed method, and illustrate its strength numerically with various image decomposition tasks employing infimal convolution functionals. △ Less

Submitted 15 April, 2025; v1 submitted 2 September, 2024; originally announced September 2024.

arXiv:2408.06996 [pdf, other]

Blessing of Dimensionality for Approximating Sobolev Classes on Manifolds

Authors: Hong Ye Tan, Subhadip Mukherjee, Junqi Tang, Carola-Bibiane Schönlieb

Abstract: The manifold hypothesis says that natural high-dimensional data lie on or around a low-dimensional manifold. The recent success of statistical and learning-based methods in very high dimensions empirically supports this hypothesis, suggesting that typical worst-case analysis does not provide practical guarantees. A natural step for analysis is thus to assume the manifold hypothesis and derive boun… ▽ More The manifold hypothesis says that natural high-dimensional data lie on or around a low-dimensional manifold. The recent success of statistical and learning-based methods in very high dimensions empirically supports this hypothesis, suggesting that typical worst-case analysis does not provide practical guarantees. A natural step for analysis is thus to assume the manifold hypothesis and derive bounds that are independent of any ambient dimensions that the data may be embedded in. Theoretical implications in this direction have recently been explored in terms of generalization of ReLU networks and convergence of Langevin methods. In this work, we consider optimal uniform approximations with functions of finite statistical complexity. While upper bounds on uniform approximation exist in the literature using ReLU neural networks, we consider the opposite: lower bounds to quantify the fundamental difficulty of approximation on manifolds. In particular, we demonstrate that the statistical complexity required to approximate a class of bounded Sobolev functions on a compact manifold is bounded from below, and moreover that this bound is dependent only on the intrinsic properties of the manifold, such as curvature, volume, and injectivity radius. △ Less

Submitted 3 May, 2025; v1 submitted 13 August, 2024; originally announced August 2024.

MSC Class: 41A25; 41A46; 53Z50;

arXiv:2407.04516 [pdf, ps, other]

G-Adaptivity: optimised graph-based mesh relocation for finite element methods

Authors: James Rowbottom, Georg Maierhofer, Teo Deveney, Eike Mueller, Alberto Paganini, Katharina Schratz, Pietro Liò, Carola-Bibiane Schönlieb, Chris Budd

Abstract: We present a novel, and effective, approach to achieve optimal mesh relocation in finite element methods (FEMs). The cost and accuracy of FEMs is critically dependent on the choice of mesh points. Mesh relocation (r-adaptivity) seeks to optimise the mesh geometry to obtain the best solution accuracy at given computational budget. Classical r-adaptivity relies on the solution of a separate nonlinea… ▽ More We present a novel, and effective, approach to achieve optimal mesh relocation in finite element methods (FEMs). The cost and accuracy of FEMs is critically dependent on the choice of mesh points. Mesh relocation (r-adaptivity) seeks to optimise the mesh geometry to obtain the best solution accuracy at given computational budget. Classical r-adaptivity relies on the solution of a separate nonlinear "meshing" PDE to determine mesh point locations. This incurs significant cost at remeshing, and relies on estimates that relate interpolation- and FEM-error. Recent machine learning approaches have focused on the construction of fast surrogates for such classical methods. Instead, our new approach trains a graph neural network (GNN) to determine mesh point locations by directly minimising the FE solution error from the PDE system Firedrake to achieve higher solution accuracy. Our GNN architecture closely aligns the mesh solution space to that of classical meshing methodologies, thus replacing classical estimates for optimality with a learnable strategy. This allows for rapid and robust training and results in an extremely efficient and effective GNN approach to online r-adaptivity. Our method outperforms both classical, and prior ML, approaches to r-adaptive meshing. In particular, it achieves lower FE solution error, whilst retaining the significant speed-up over classical methods observed in prior ML work. △ Less

Submitted 21 June, 2025; v1 submitted 5 July, 2024; originally announced July 2024.

Journal ref: Proceedings of the 42nd International Conference on Machine Learning, 2025

arXiv:2406.02458 [pdf, other]

Deep Block Proximal Linearised Minimisation Algorithm for Non-convex Inverse Problems

Authors: Chaoyan Huang, Zhongming Wu, Yanqi Cheng, Tieyong Zeng, Carola-Bibiane Schönlieb, Angelica I. Aviles-Rivero

Abstract: Image restoration is typically addressed through non-convex inverse problems, which are often solved using first-order block-wise splitting methods. In this paper, we consider a general type of non-convex optimisation model that captures many inverse image problems and present an inertial block proximal linearised minimisation (iBPLM) algorithm. Our new method unifies the Jacobi-type parallel and… ▽ More Image restoration is typically addressed through non-convex inverse problems, which are often solved using first-order block-wise splitting methods. In this paper, we consider a general type of non-convex optimisation model that captures many inverse image problems and present an inertial block proximal linearised minimisation (iBPLM) algorithm. Our new method unifies the Jacobi-type parallel and the Gauss-Seidel-type alternating update rules, and extends beyond these approaches. The inertial technique is also incorporated into each block-wise subproblem update, which can accelerate numerical convergence. Furthermore, we extend this framework with a plug-and-play variant (PnP-iBPLM) that integrates deep gradient denoisers, offering a flexible and robust solution for complex imaging tasks. We provide comprehensive theoretical analysis, demonstrating both subsequential and global convergence of the proposed algorithms. To validate our methods, we apply them to multi-block dictionary learning problems in image denoising and deblurring. Experimental results show that both iBPLM and PnP-iBPLM significantly enhance numerical performance and robustness in these applications. △ Less

Submitted 13 November, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: 6 figures, 3 tables

arXiv:2403.17100 [pdf, ps, other]

Practical Acceleration of the Condat-Vũ Algorithm

Authors: Derek Driggs, Matthias J. Ehrhardt, Carola-Bibiane Schönlieb, Junqi Tang

Abstract: The Condat-Vũ algorithm is a widely used primal-dual method for optimizing composite objectives of three functions. Several algorithms for optimizing composite objectives of two functions are special cases of Condat-Vũ, including proximal gradient descent (PGD). It is well-known that PGD exhibits suboptimal performance, and a simple adjustment to PGD can accelerate its convergence rate from… ▽ More The Condat-Vũ algorithm is a widely used primal-dual method for optimizing composite objectives of three functions. Several algorithms for optimizing composite objectives of two functions are special cases of Condat-Vũ, including proximal gradient descent (PGD). It is well-known that PGD exhibits suboptimal performance, and a simple adjustment to PGD can accelerate its convergence rate from $\mathcal{O}(1/T)$ to $\mathcal{O}(1/T^2)$ on convex objectives, and this accelerated rate is optimal. In this work, we show that a simple adjustment to the Condat-Vũ algorithm allows it to recover accelerated PGD (APGD) as a special case, instead of PGD. We prove that this accelerated Condat--Vũ algorithm achieves optimal convergence rates and significantly outperforms the traditional Condat-Vũ algorithm in regimes where the Condat--Vũ algorithm approximates the dynamics of PGD. We demonstrate the effectiveness of our approach in various applications in machine learning and computational imaging. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2402.03541 [pdf, other]

HAMLET: Graph Transformer Neural Operator for Partial Differential Equations

Authors: Andrey Bryutkin, Jiahao Huang, Zhongying Deng, Guang Yang, Carola-Bibiane Schönlieb, Angelica Aviles-Rivero

Abstract: We present a novel graph transformer framework, HAMLET, designed to address the challenges in solving partial differential equations (PDEs) using neural networks. The framework uses graph transformers with modular input encoders to directly incorporate differential equation information into the solution process. This modularity enhances parameter correspondence control, making HAMLET adaptable to… ▽ More We present a novel graph transformer framework, HAMLET, designed to address the challenges in solving partial differential equations (PDEs) using neural networks. The framework uses graph transformers with modular input encoders to directly incorporate differential equation information into the solution process. This modularity enhances parameter correspondence control, making HAMLET adaptable to PDEs of arbitrary geometries and varied input formats. Notably, HAMLET scales effectively with increasing data complexity and noise, showcasing its robustness. HAMLET is not just tailored to a single type of physical simulation, but can be applied across various domains. Moreover, it boosts model resilience and performance, especially in scenarios with limited data. We demonstrate, through extensive experiments, that our framework is capable of outperforming current techniques for PDEs. △ Less

Submitted 2 October, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: 18 pages, 7 figures, 6 tables

Journal ref: Proceedings of Machine Learning Research, Vol. 235, pp. 4624-4641, 2024

arXiv:2402.01052 [pdf, other]

Weakly Convex Regularisers for Inverse Problems: Convergence of Critical Points and Primal-Dual Optimisation

Authors: Zakhar Shumaylov, Jeremy Budd, Subhadip Mukherjee, Carola-Bibiane Schönlieb

Abstract: Variational regularisation is the primary method for solving inverse problems, and recently there has been considerable work leveraging deeply learned regularisation for enhanced performance. However, few results exist addressing the convergence of such regularisation, particularly within the context of critical points as opposed to global minimisers. In this paper, we present a generalised formul… ▽ More Variational regularisation is the primary method for solving inverse problems, and recently there has been considerable work leveraging deeply learned regularisation for enhanced performance. However, few results exist addressing the convergence of such regularisation, particularly within the context of critical points as opposed to global minimisers. In this paper, we present a generalised formulation of convergent regularisation in terms of critical points, and show that this is achieved by a class of weakly convex regularisers. We prove convergence of the primal-dual hybrid gradient method for the associated variational problem, and, given a Kurdyka-Lojasiewicz condition, an $\mathcal{O}(\log{k}/k)$ ergodic convergence rate. Finally, applying this theory to learned regularisation, we prove universal approximation for input weakly convex neural networks (IWCNN), and show empirically that IWCNNs can lead to improved performance of learned adversarial regularisers for computed tomography (CT) reconstruction. △ Less

Submitted 15 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: 26 pages, 4 figures; https://openreview.net/forum?id=E8FpcUyPuS

arXiv:2311.15996 [pdf, other]

Closing the ODE-SDE gap in score-based diffusion models through the Fokker-Planck equation

Authors: Teo Deveney, Jan Stanczuk, Lisa Maria Kreusser, Chris Budd, Carola-Bibiane Schönlieb

Abstract: Score-based diffusion models have emerged as one of the most promising frameworks for deep generative modelling, due to their state-of-the art performance in many generation tasks while relying on mathematical foundations such as stochastic differential equations (SDEs) and ordinary differential equations (ODEs). Empirically, it has been reported that ODE based samples are inferior to SDE based sa… ▽ More Score-based diffusion models have emerged as one of the most promising frameworks for deep generative modelling, due to their state-of-the art performance in many generation tasks while relying on mathematical foundations such as stochastic differential equations (SDEs) and ordinary differential equations (ODEs). Empirically, it has been reported that ODE based samples are inferior to SDE based samples. In this paper we rigorously describe the range of dynamics and approximations that arise when training score-based diffusion models, including the true SDE dynamics, the neural approximations, the various approximate particle dynamics that result, as well as their associated Fokker--Planck equations and the neural network approximations of these Fokker--Planck equations. We systematically analyse the difference between the ODE and SDE dynamics of score-based diffusion models, and link it to an associated Fokker--Planck equation. We derive a theoretical upper bound on the Wasserstein 2-distance between the ODE- and SDE-induced distributions in terms of a Fokker--Planck residual. We also show numerically that conventional score-based diffusion models can exhibit significant differences between ODE- and SDE-induced distributions which we demonstrate using explicit comparisons. Moreover, we show numerically that reducing the Fokker--Planck residual by adding it as an additional regularisation term leads to closing the gap between ODE- and SDE-induced distributions. Our experiments suggest that this regularisation can improve the distribution generated by the ODE, however that this can come at the cost of degraded SDE sample quality. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2308.07818 [pdf, other]

Riemannian geometry for efficient analysis of protein dynamics data

Authors: Willem Diepeveen, Carlos Esteve-Yagüe, Jan Lellmann, Ozan Öktem, Carola-Bibiane Schönlieb

Abstract: An increasingly common viewpoint is that protein dynamics data sets reside in a non-linear subspace of low conformational energy. Ideal data analysis tools for such data sets should therefore account for such non-linear geometry. The Riemannian geometry setting can be suitable for a variety of reasons. First, it comes with a rich structure to account for a wide range of geometries that can be mode… ▽ More An increasingly common viewpoint is that protein dynamics data sets reside in a non-linear subspace of low conformational energy. Ideal data analysis tools for such data sets should therefore account for such non-linear geometry. The Riemannian geometry setting can be suitable for a variety of reasons. First, it comes with a rich structure to account for a wide range of geometries that can be modelled after an energy landscape. Second, many standard data analysis tools initially developed for data in Euclidean space can also be generalised to data on a Riemannian manifold. In the context of protein dynamics, a conceptual challenge comes from the lack of a suitable smooth manifold and the lack of guidelines for constructing a smooth Riemannian structure based on an energy landscape. In addition, computational feasibility in computing geodesics and related mappings poses a major challenge. This work considers these challenges. The first part of the paper develops a novel local approximation technique for computing geodesics and related mappings on Riemannian manifolds in a computationally feasible manner. The second part constructs a smooth manifold of point clouds modulo rigid body group actions and a Riemannian structure that is based on an energy landscape for protein conformations. The resulting Riemannian geometry is tested on several data analysis tasks relevant for protein dynamics data. It performs exceptionally well on coarse-grained molecular dynamics simulated data. In particular, the geodesics with given start- and end-points approximately recover corresponding molecular dynamics trajectories for proteins that undergo relatively ordered transitions with medium sized deformations. The Riemannian protein geometry also gives physically realistic summary statistics and retrieves the underlying dimension even for large-sized deformations within seconds on a laptop. △ Less

Submitted 26 October, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

MSC Class: 49Q10; 53C22; 53Z10; 53Z50; 65D18; 92-08; 92-10

arXiv:2308.05045 [pdf, other]

Boosting Data-Driven Mirror Descent with Randomization, Equivariance, and Acceleration

Authors: Hong Ye Tan, Subhadip Mukherjee, Junqi Tang, Carola-Bibiane Schönlieb

Abstract: Learning-to-optimize (L2O) is an emerging research area in large-scale optimization with applications in data science. Recently, researchers have proposed a novel L2O framework called learned mirror descent (LMD), based on the classical mirror descent (MD) algorithm with learnable mirror maps parameterized by input-convex neural networks. The LMD approach has been shown to significantly accelerate… ▽ More Learning-to-optimize (L2O) is an emerging research area in large-scale optimization with applications in data science. Recently, researchers have proposed a novel L2O framework called learned mirror descent (LMD), based on the classical mirror descent (MD) algorithm with learnable mirror maps parameterized by input-convex neural networks. The LMD approach has been shown to significantly accelerate convex solvers while inheriting the convergence properties of the classical MD algorithm. This work proposes several practical extensions of the LMD algorithm, addressing its instability, scalability, and feasibility for high-dimensional problems. We first propose accelerated and stochastic variants of LMD, leveraging classical momentum-based acceleration and stochastic optimization techniques for improving the convergence rate and per-iteration computational complexity. Moreover, for the particular application of training neural networks, we derive and propose a novel and efficient parameterization for the mirror potential, exploiting the equivariant structure of the training problems to significantly reduce the dimensionality of the underlying problem. We provide theoretical convergence guarantees for our schemes under standard assumptions and demonstrate their effectiveness in various computational imaging and machine learning applications such as image inpainting, and the training of support vector machines and deep neural networks. △ Less

Submitted 10 May, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

MSC Class: 46N10; 65K10; 65G50

arXiv:2307.13579 [pdf, other]

Reinterpreting survival analysis in the universal approximator age

Authors: Sören Dittmer, Michael Roberts, Jacobus Preller, AIX COVNET, James H. F. Rudd, John A. D. Aston, Carola-Bibiane Schönlieb

Abstract: Survival analysis is an integral part of the statistical toolbox. However, while most domains of classical statistics have embraced deep learning, survival analysis only recently gained some minor attention from the deep learning community. This recent development is likely in part motivated by the COVID-19 pandemic. We aim to provide the tools needed to fully harness the potential of survival ana… ▽ More Survival analysis is an integral part of the statistical toolbox. However, while most domains of classical statistics have embraced deep learning, survival analysis only recently gained some minor attention from the deep learning community. This recent development is likely in part motivated by the COVID-19 pandemic. We aim to provide the tools needed to fully harness the potential of survival analysis in deep learning. On the one hand, we discuss how survival analysis connects to classification and regression. On the other hand, we provide technical tools. We provide a new loss function, evaluation metrics, and the first universal approximating network that provably produces survival curves without numeric integration. We show that the loss function and model outperform other approaches using a large numerical study. △ Less

Submitted 25 July, 2023; originally announced July 2023.

arXiv:2307.09441 [pdf, other]

Convergent regularization in inverse problems and linear plug-and-play denoisers

Authors: Andreas Hauptmann, Subhadip Mukherjee, Carola-Bibiane Schönlieb, Ferdia Sherry

Abstract: Plug-and-play (PnP) denoising is a popular iterative framework for solving imaging inverse problems using off-the-shelf image denoisers. Their empirical success has motivated a line of research that seeks to understand the convergence of PnP iterates under various assumptions on the denoiser. While a significant amount of research has gone into establishing the convergence of the PnP iteration for… ▽ More Plug-and-play (PnP) denoising is a popular iterative framework for solving imaging inverse problems using off-the-shelf image denoisers. Their empirical success has motivated a line of research that seeks to understand the convergence of PnP iterates under various assumptions on the denoiser. While a significant amount of research has gone into establishing the convergence of the PnP iteration for different regularity conditions on the denoisers, not much is known about the asymptotic properties of the converged solution as the noise level in the measurement tends to zero, i.e., whether PnP methods are provably convergent regularization schemes under reasonable assumptions on the denoiser. This paper serves two purposes: first, we provide an overview of the classical regularization theory in inverse problems and survey a few notable recent data-driven methods that are provably convergent regularization schemes. We then continue to discuss PnP algorithms and their established convergence guarantees. Subsequently, we consider PnP algorithms with linear denoisers and propose a novel spectral filtering technique to control the strength of regularization arising from the denoiser. Further, by relating the implicit regularization of the denoiser to an explicit regularization functional, we rigorously show that PnP with linear denoisers leads to a convergent regularization scheme. More specifically, we prove that in the limit as the noise vanishes, the PnP reconstruction converges to the minimizer of a regularization potential subject to the solution satisfying the noiseless operator equation. The theoretical analysis is corroborated by numerical experiments for the classical inverse problem of tomographic image reconstruction. △ Less

Submitted 18 July, 2023; originally announced July 2023.

arXiv:2307.07344 [pdf, other]

Inverse Evolution Layers: Physics-informed Regularizers for Deep Neural Networks

Authors: Chaoyu Liu, Zhonghua Qiao, Chao Li, Carola-Bibiane Schönlieb

Abstract: Traditional image processing methods employing partial differential equations (PDEs) offer a multitude of meaningful regularizers, along with valuable theoretical foundations for a wide range of image-related tasks. This makes their integration into neural networks a promising avenue. In this paper, we introduce a novel regularization approach inspired by the reverse process of PDE-based evolution… ▽ More Traditional image processing methods employing partial differential equations (PDEs) offer a multitude of meaningful regularizers, along with valuable theoretical foundations for a wide range of image-related tasks. This makes their integration into neural networks a promising avenue. In this paper, we introduce a novel regularization approach inspired by the reverse process of PDE-based evolution models. Specifically, we propose inverse evolution layers (IELs), which serve as bad property amplifiers to penalize neural networks of which outputs have undesired characteristics. Using IELs, one can achieve specific regularization objectives and endow neural networks' outputs with corresponding properties of the PDE models. Our experiments, focusing on semantic segmentation tasks using heat-diffusion IELs, demonstrate their effectiveness in mitigating noisy label effects. Additionally, we develop curve-motion IELs to enforce convex shape regularization in neural network-based segmentation models for preventing the generation of concave outputs. Theoretical analysis confirms the efficacy of IELs as an effective regularization mechanism, particularly in handling training with label issues. △ Less

Submitted 1 July, 2024; v1 submitted 14 July, 2023; originally announced July 2023.

arXiv:2306.17737 [pdf, other]

doi 10.1137/23M1593565

Proximal Langevin Sampling With Inexact Proximal Mapping

Authors: Matthias J. Ehrhardt, Lorenz Kuger, Carola-Bibiane Schönlieb

Abstract: In order to solve tasks like uncertainty quantification or hypothesis tests in Bayesian imaging inverse problems, we often have to draw samples from the arising posterior distribution. For the usually log-concave but high-dimensional posteriors, Markov chain Monte Carlo methods based on time discretizations of Langevin diffusion are a popular tool. If the potential defining the distribution is non… ▽ More In order to solve tasks like uncertainty quantification or hypothesis tests in Bayesian imaging inverse problems, we often have to draw samples from the arising posterior distribution. For the usually log-concave but high-dimensional posteriors, Markov chain Monte Carlo methods based on time discretizations of Langevin diffusion are a popular tool. If the potential defining the distribution is non-smooth, these discretizations are usually of an implicit form leading to Langevin sampling algorithms that require the evaluation of proximal operators. For some of the potentials relevant in imaging problems this is only possible approximately using an iterative scheme. We investigate the behaviour of a proximal Langevin algorithm under the presence of errors in the evaluation of proximal mappings. We generalize existing non-asymptotic and asymptotic convergence results of the exact algorithm to our inexact setting and quantify the bias between the target and the algorithm's stationary distribution due to the errors. We show that the additional bias stays bounded for bounded errors and converges to zero for decaying errors in a strongly convex setting. We apply the inexact algorithm to sample numerically from the posterior of typical imaging inverse problems in which we can only approximate the proximal operator by an iterative scheme and validate our theoretical convergence results. △ Less

Submitted 13 May, 2024; v1 submitted 30 June, 2023; originally announced June 2023.

Comments: 29 pages, 11 figures

Journal ref: SIAM Journal on Imaging Sciences 2024 17:3, 1729-1760

arXiv:2304.08628 [pdf, other]

A sparse optimization approach to infinite infimal convolution regularization

Authors: Kristian Bredies, Marcello Carioni, Martin Holler, Yury Korolev, Carola-Bibiane Schönlieb

Abstract: In this paper we introduce the class of infinite infimal convolution functionals and apply these functionals to the regularization of ill-posed inverse problems. The proposed regularization involves an infimal convolution of a continuously parametrized family of convex, positively one-homogeneous functionals defined on a common Banach space $X$. We show that, under mild assumptions, this functiona… ▽ More In this paper we introduce the class of infinite infimal convolution functionals and apply these functionals to the regularization of ill-posed inverse problems. The proposed regularization involves an infimal convolution of a continuously parametrized family of convex, positively one-homogeneous functionals defined on a common Banach space $X$. We show that, under mild assumptions, this functional admits an equivalent convex lifting in the space of measures with values in $X$. This reformulation allows us to prove well-posedness of a Tikhonov regularized inverse problem and opens the door to a sparse analysis of the solutions. In the case of finite-dimensional measurements we prove a representer theorem, showing that there exists a solution of the inverse problem that is sparse, in the sense that it can be represented as a linear combination of the extremal points of the ball of the lifted infinite infimal convolution functional. Then, we design a generalized conditional gradient method for computing solutions of the inverse problem without relying on an a priori discretization of the parameter space and of the Banach space $X$. The iterates are constructed as linear combinations of the extremal points of the lifted infinite infimal convolution functional. We prove a sublinear rate of convergence for our algorithm and apply it to denoising of signals and images using, as regularizer, infinite infimal convolutions of fractional-Laplacian-type operators with adaptive orders of smoothness and anisotropies. △ Less

Submitted 15 December, 2024; v1 submitted 17 April, 2023; originally announced April 2023.

arXiv:2304.08342 [pdf, other]

NF-ULA: Langevin Monte Carlo with Normalizing Flow Prior for Imaging Inverse Problems

Authors: Ziruo Cai, Junqi Tang, Subhadip Mukherjee, Jinglai Li, Carola Bibiane Schönlieb, Xiaoqun Zhang

Abstract: Bayesian methods for solving inverse problems are a powerful alternative to classical methods since the Bayesian approach offers the ability to quantify the uncertainty in the solution. In recent years, data-driven techniques for solving inverse problems have also been remarkably successful, due to their superior representation ability. In this work, we incorporate data-based models into a class o… ▽ More Bayesian methods for solving inverse problems are a powerful alternative to classical methods since the Bayesian approach offers the ability to quantify the uncertainty in the solution. In recent years, data-driven techniques for solving inverse problems have also been remarkably successful, due to their superior representation ability. In this work, we incorporate data-based models into a class of Langevin-based sampling algorithms for Bayesian inference in imaging inverse problems. In particular, we introduce NF-ULA (Normalizing Flow-based Unadjusted Langevin algorithm), which involves learning a normalizing flow (NF) as the image prior. We use NF to learn the prior because a tractable closed-form expression for the log prior enables the differentiation of it using autograd libraries. Our algorithm only requires a normalizing flow-based generative network, which can be pre-trained independently of the considered inverse problem and the forward operator. We perform theoretical analysis by investigating the well-posedness and non-asymptotic convergence of the resulting NF-ULA algorithm. The efficacy of the proposed NF-ULA algorithm is demonstrated in various image restoration problems such as image deblurring, image inpainting, and limited-angle X-ray computed tomography (CT) reconstruction. NF-ULA is found to perform better than competing methods for severely ill-posed inverse problems. △ Less

Submitted 14 October, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

arXiv:2303.07271 [pdf, other]

Provably Convergent Plug-and-Play Quasi-Newton Methods

Authors: Hong Ye Tan, Subhadip Mukherjee, Junqi Tang, Carola-Bibiane Schönlieb

Abstract: Plug-and-Play (PnP) methods are a class of efficient iterative methods that aim to combine data fidelity terms and deep denoisers using classical optimization algorithms, such as ISTA or ADMM, with applications in inverse problems and imaging. Provable PnP methods are a subclass of PnP methods with convergence guarantees, such as fixed point convergence or convergence to critical points of some en… ▽ More Plug-and-Play (PnP) methods are a class of efficient iterative methods that aim to combine data fidelity terms and deep denoisers using classical optimization algorithms, such as ISTA or ADMM, with applications in inverse problems and imaging. Provable PnP methods are a subclass of PnP methods with convergence guarantees, such as fixed point convergence or convergence to critical points of some energy function. Many existing provable PnP methods impose heavy restrictions on the denoiser or fidelity function, such as non-expansiveness or strict convexity, respectively. In this work, we propose a novel algorithmic approach incorporating quasi-Newton steps into a provable PnP framework based on proximal denoisers, resulting in greatly accelerated convergence while retaining light assumptions on the denoiser. By characterizing the denoiser as the proximal operator of a weakly convex function, we show that the fixed points of the proposed quasi-Newton PnP algorithm are critical points of a weakly convex function. Numerical experiments on image deblurring and super-resolution demonstrate 2--8x faster convergence as compared to other provable PnP methods with similar reconstruction quality. △ Less

Submitted 13 November, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

MSC Class: 49M15; 49J52; 65K15

arXiv:2303.03794 [pdf, other]

Hidden Knowledge: Mathematical Methods for the Extraction of the Fingerprint of Medieval Paper from Digital Images

Authors: Tamara G. Grossmann, Carola-Bibiane Schönlieb, Orietta Da Rold

Abstract: Medieval paper, a handmade product, is made with a mould which leaves an indelible imprint on the sheet of paper. This imprint includes chain lines, laid lines and watermarks which are often visible on the sheet. Extracting these features allows the identification of paper stock and gives information about chronology, localisation and movement of books and people. Most computational work for featu… ▽ More Medieval paper, a handmade product, is made with a mould which leaves an indelible imprint on the sheet of paper. This imprint includes chain lines, laid lines and watermarks which are often visible on the sheet. Extracting these features allows the identification of paper stock and gives information about chronology, localisation and movement of books and people. Most computational work for feature extraction of paper analysis has so far focused on radiography or transmitted light images. While these imaging methods provide clear visualisation for the features of interest, they are expensive and time consuming in their acquisition and not feasible for smaller institutions. However, reflected light images of medieval paper manuscripts are abundant and possibly cheaper in their acquisition. In this paper, we propose algorithms to detect and extract the laid and chain lines from reflected light images. We tackle the main drawback of reflected light images, that is, the low contrast attenuation of lines and intensity jumps due to noise and degradation, by employing the spectral total variation decomposition and develop methods for subsequent line extraction. Our results clearly demonstrate the feasibility of using reflected light images in paper analysis. This work enables the feature extraction for paper manuscripts that have otherwise not been analysed due to a lack of appropriate images. We also open the door for paper stock identification at scale. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2302.04107 [pdf, other]

Can Physics-Informed Neural Networks beat the Finite Element Method?

Authors: Tamara G. Grossmann, Urszula Julia Komorowska, Jonas Latz, Carola-Bibiane Schönlieb

Abstract: Partial differential equations play a fundamental role in the mathematical modelling of many processes and systems in physical, biological and other sciences. To simulate such processes and systems, the solutions of PDEs often need to be approximated numerically. The finite element method, for instance, is a usual standard methodology to do so. The recent success of deep neural networks at various… ▽ More Partial differential equations play a fundamental role in the mathematical modelling of many processes and systems in physical, biological and other sciences. To simulate such processes and systems, the solutions of PDEs often need to be approximated numerically. The finite element method, for instance, is a usual standard methodology to do so. The recent success of deep neural networks at various approximation tasks has motivated their use in the numerical solution of PDEs. These so-called physics-informed neural networks and their variants have shown to be able to successfully approximate a large range of partial differential equations. So far, physics-informed neural networks and the finite element method have mainly been studied in isolation of each other. In this work, we compare the methodologies in a systematic computational study. Indeed, we employ both methods to numerically solve various linear and nonlinear partial differential equations: Poisson in 1D, 2D, and 3D, Allen-Cahn in 1D, semilinear Schrödinger in 1D and 2D. We then compare computational costs and approximation accuracies. In terms of solution time and accuracy, physics-informed neural networks have not been able to outperform the finite element method in our study. In some experiments, they were faster at evaluating the solved PDE. △ Less

Submitted 8 February, 2023; originally announced February 2023.

arXiv:2301.02511 [pdf, ps, other]

Stochastic Primal Dual Hybrid Gradient Algorithm with Adaptive Step-Sizes

Authors: Antonin Chambolle, Claire Delplancke, Matthias J. Ehrhardt, Carola-Bibiane Schönlieb, Junqi Tang

Abstract: In this work we propose a new primal-dual algorithm with adaptive step-sizes. The stochastic primal-dual hybrid gradient (SPDHG) algorithm with constant step-sizes has become widely applied in large-scale convex optimization across many scientific fields due to its scalability. While the product of the primal and dual step-sizes is subject to an upper-bound in order to ensure convergence, the sele… ▽ More In this work we propose a new primal-dual algorithm with adaptive step-sizes. The stochastic primal-dual hybrid gradient (SPDHG) algorithm with constant step-sizes has become widely applied in large-scale convex optimization across many scientific fields due to its scalability. While the product of the primal and dual step-sizes is subject to an upper-bound in order to ensure convergence, the selection of the ratio of the step-sizes is critical in applications. Up-to-now there is no systematic and successful way of selecting the primal and dual step-sizes for SPDHG. In this work, we propose a general class of adaptive SPDHG (A-SPDHG) algorithms, and prove their convergence under weak assumptions. We also propose concrete parameters-updating strategies which satisfy the assumptions of our theory and thereby lead to convergent algorithms. Numerical examples on computed tomography demonstrate the effectiveness of the proposed schemes. △ Less

Submitted 4 December, 2023; v1 submitted 6 January, 2023; originally announced January 2023.

Comments: 31 pages, 9 figures

MSC Class: 47N10; 49J40; 65D18; 65K10; 90C06; 90C15; 90C25; 92C55; 94A08

arXiv:2211.14212 [pdf, other]

doi 10.1088/1361-6560/acd616

On Krylov Methods for Large Scale CBCT Reconstruction

Authors: Malena Sabate Landman, Ander Biguri, Sepideh Hatamikia, Richard Boardman, John Aston, Carola-Bibiane Schonlieb

Abstract: Krylov subspace methods are a powerful family of iterative solvers for linear systems of equations, which are commonly used for inverse problems due to their intrinsic regularization properties. Moreover, these methods are naturally suited to solve large-scale problems, as they only require matrix-vector products with the system matrix (and its adjoint) to compute approximate solutions, and they d… ▽ More Krylov subspace methods are a powerful family of iterative solvers for linear systems of equations, which are commonly used for inverse problems due to their intrinsic regularization properties. Moreover, these methods are naturally suited to solve large-scale problems, as they only require matrix-vector products with the system matrix (and its adjoint) to compute approximate solutions, and they display a very fast convergence. Even if this class of methods has been widely researched and studied in the numerical linear algebra community, its use in applied medical physics and applied engineering is still very limited. e.g. in realistic large-scale Computed Tomography (CT) problems, and more specifically in Cone Beam CT (CBCT). This work attempts to breach this gap by providing a general framework for the most relevant Krylov subspace methods applied to 3D CT problems, including the most well-known Krylov solvers for non-square systems (CGLS, LSQR, LSMR), possibly in combination with Tikhonov regularization, and methods that incorporate total variation (TV) regularization. This is provided within an open source framework: the Tomographic Iterative GPU-based Reconstruction (TIGRE) toolbox, with the idea of promoting accessibility and reproducibility of the results for the algorithms presented. Finally, numerical results in synthetic and real-world 3D CT applications (medical CBCT and μ-CT datasets) are provided to showcase and compare the different Krylov subspace methods presented in the paper, as well as their suitability for different kinds of problems. △ Less

Submitted 25 November, 2022; originally announced November 2022.

Comments: submitted

arXiv:2211.05205 [pdf, ps, other]

Maximum Entropy on the Mean and the Cramér Rate Function in Statistical Estimation and Inverse Problems: Properties, Models, and Algorithms

Authors: Yakov Vaisbourd, Rustum Choksi, Ariel Goodwin, Tim Hoheisel, Carola-Bibiane Schönlieb

Abstract: We explore a method of statistical estimation called Maximum Entropy on the Mean (MEM) which is based on an information-driven criterion that quantifies the compliance of a given point with a reference prior probability measure. At the core of this approach lies the MEM function which is a partial minimization of the Kullback-Leibler divergence over a linear constraint. In many cases, it is known… ▽ More We explore a method of statistical estimation called Maximum Entropy on the Mean (MEM) which is based on an information-driven criterion that quantifies the compliance of a given point with a reference prior probability measure. At the core of this approach lies the MEM function which is a partial minimization of the Kullback-Leibler divergence over a linear constraint. In many cases, it is known that this function admits a simpler representation (known as the Cramér rate function). Via the connection to exponential families of probability distributions, we study general conditions under which this representation holds. We then address how the associated MEM estimator gives rise to a wide class of MEM-based regularized linear models for solving inverse problems. Finally, we propose an algorithmic framework to solve these problems efficiently based on the Bregman proximal gradient method, alongside proximal operators for commonly used reference distributions. The article is complemented by a software package for experimentation and exploration of the MEM approach in applications. △ Less

Submitted 16 December, 2022; v1 submitted 9 November, 2022; originally announced November 2022.

arXiv:2210.12238 [pdf, other]

doi 10.1109/ICASSP49357.2023.10096875

Robust Data-Driven Accelerated Mirror Descent

Authors: Hong Ye Tan, Subhadip Mukherjee, Junqi Tang, Andreas Hauptmann, Carola-Bibiane Schönlieb

Abstract: Learning-to-optimize is an emerging framework that leverages training data to speed up the solution of certain optimization problems. One such approach is based on the classical mirror descent algorithm, where the mirror map is modelled using input-convex neural networks. In this work, we extend this functional parameterization approach by introducing momentum into the iterations, based on the cla… ▽ More Learning-to-optimize is an emerging framework that leverages training data to speed up the solution of certain optimization problems. One such approach is based on the classical mirror descent algorithm, where the mirror map is modelled using input-convex neural networks. In this work, we extend this functional parameterization approach by introducing momentum into the iterations, based on the classical accelerated mirror descent. Our approach combines short-time accelerated convergence with stable long-time behavior. We empirically demonstrate additional robustness with respect to multiple parameters on denoising and deconvolution experiments. △ Less

Submitted 2 June, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

Comments: Note inconsistency with ICASSP paper for step-size choice in (4c) and associated Alg. 1, this version is correct with step-size kt/r

Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:2210.02373 [pdf, other]

Dynamical systems' based neural networks

Authors: Elena Celledoni, Davide Murari, Brynjulf Owren, Carola-Bibiane Schönlieb, Ferdia Sherry

Abstract: Neural networks have gained much interest because of their effectiveness in many applications. However, their mathematical properties are generally not well understood. If there is some underlying geometric structure inherent to the data or to the function to approximate, it is often desirable to take this into account in the design of the neural network. In this work, we start with a non-autonomo… ▽ More Neural networks have gained much interest because of their effectiveness in many applications. However, their mathematical properties are generally not well understood. If there is some underlying geometric structure inherent to the data or to the function to approximate, it is often desirable to take this into account in the design of the neural network. In this work, we start with a non-autonomous ODE and build neural networks using a suitable, structure-preserving, numerical time-discretisation. The structure of the neural network is then inferred from the properties of the ODE vector field. Besides injecting more structure into the network architectures, this modelling procedure allows a better theoretical understanding of their behaviour. We present two universal approximation results and demonstrate how to impose some particular properties on the neural networks. A particular focus is on 1-Lipschitz architectures including layers that are not 1-Lipschitz. These networks are expressive and robust against adversarial attacks, as shown for the CIFAR-10 and CIFAR-100 datasets. △ Less

Submitted 31 August, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

MSC Class: 65L05; 65L06; 37M15

arXiv:2209.05546 [pdf, other]

doi 10.1088/1361-6420/acb2ba

Spectral decomposition of atomic structures in heterogeneous cryo-EM

Authors: Carlos Esteve-Yagüe, Willem Diepeveen, Ozan Öktem, Carola-Bibiane Schönlieb

Abstract: We consider the problem of recovering the three-dimensional atomic structure of a flexible macromolecule from a heterogeneous cryo-EM dataset. The dataset contains noisy tomographic projections of the electrostatic potential of the macromolecule, taken from different viewing directions, and in the heterogeneous case, each image corresponds to a different conformation of the macromolecule. Under th… ▽ More We consider the problem of recovering the three-dimensional atomic structure of a flexible macromolecule from a heterogeneous cryo-EM dataset. The dataset contains noisy tomographic projections of the electrostatic potential of the macromolecule, taken from different viewing directions, and in the heterogeneous case, each image corresponds to a different conformation of the macromolecule. Under the assumption that the macromolecule can be modelled as a chain, or discrete curve (as it is for instance the case for a protein backbone with a single chain of amino-acids), we introduce a method to estimate the deformation of the atomic model with respect to a given conformation, which is assumed to be known a priori. Our method consists on estimating the torsion and bond angles of the atomic model in each conformation as a linear combination of the eigenfunctions of the Laplace operator in the manifold of conformations. These eigenfunctions can be approximated by means of a well-known technique in manifold learning, based on the construction of a graph Laplacian using the cryo-EM dataset. Finally, we test our approach with synthetic datasets, for which we recover the atomic model of two-dimensional and three-dimensional flexible structures from noisy tomographic projections. △ Less

Submitted 27 December, 2022; v1 submitted 12 September, 2022; originally announced September 2022.

Comments: 35 pages,20 figures

arXiv:2209.03045 [pdf, other]

doi 10.1137/22M1520773

Regularising orientation estimation in Cryo-EM 3D map refinement through measure-based lifting over Riemannian manifolds

Authors: Willem Diepeveen, Jan Lellmann, Ozan Öktem, Carola-Bibiane Schönlieb

Abstract: Motivated by the trade-off between noise-robustness and data-consistency for joint 3D map reconstruction and rotation estimation in single particle cryogenic-electron microscopy (Cryo-EM), we propose ellipsoidal support lifting (ESL), a measure-based lifting scheme for regularising and approximating the global minimiser of a smooth function over a Riemannian manifold. Under a uniqueness assumption… ▽ More Motivated by the trade-off between noise-robustness and data-consistency for joint 3D map reconstruction and rotation estimation in single particle cryogenic-electron microscopy (Cryo-EM), we propose ellipsoidal support lifting (ESL), a measure-based lifting scheme for regularising and approximating the global minimiser of a smooth function over a Riemannian manifold. Under a uniqueness assumption on the minimiser we show several theoretical results, in particular well-posedness of the method and an error bound due to the induced bias with respect to the global minimiser. Additionally, we use the developed theory to integrate the measure-based lifting scheme into an alternating update method for joint homogeneous 3D map reconstruction and rotation estimation, where typically tens of thousands of manifold-valued minimisation problems have to be solved and where regularisation is necessary because of the high noise levels in the data. The joint recovery method is used to test both the theoretical predictions and algorithmic performance through numerical experiments with Cryo-EM data. In particular, the induced bias due to the regularising effect of ESL empirically estimates better rotations, i.e., rotations closer to the ground truth, than global optimisation would. △ Less

Submitted 31 January, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

MSC Class: 90C26; 92E10; 68U10

arXiv:2208.14784 [pdf, other]

Practical Operator Sketching Framework for Accelerating Iterative Data-Driven Solutions in Inverse Problems

Authors: Junqi Tang, Guixian Xu, Subhadip Mukherjee, Carola-Bibiane Schönlieb

Abstract: We propose a new operator-sketching paradigm for designing efficient iterative data-driven reconstruction (IDR) schemes, e.g. Plug-and-Play algorithms and deep unrolling networks. These IDR schemes are currently the state-of-the-art solutions for imaging inverse problems. However, for high-dimensional imaging tasks, especially X-ray CT and MRI imaging, these IDR schemes typically become inefficien… ▽ More We propose a new operator-sketching paradigm for designing efficient iterative data-driven reconstruction (IDR) schemes, e.g. Plug-and-Play algorithms and deep unrolling networks. These IDR schemes are currently the state-of-the-art solutions for imaging inverse problems. However, for high-dimensional imaging tasks, especially X-ray CT and MRI imaging, these IDR schemes typically become inefficient both in terms of computation, due to the need of computing multiple times the high-dimensional forward and adjoint operators. In this work, we explore and propose a universal dimensionality reduction framework for accelerating IDR schemes in solving imaging inverse problems, based on leveraging the sketching techniques from stochastic optimization. Using this framework, we derive a number of accelerated IDR schemes, such as the plug-and-play multi-stage sketched gradient (PnP-MS2G) and sketching-based primal-dual (LSPD and Sk-LSPD) deep unrolling networks. Meanwhile, for fully accelerating PnP schemes when the denoisers are computationally expensive, we provide novel stochastic lazy denoising schemes (Lazy-PnP and Lazy-PnP-EQ), leveraging the ProxSkip scheme in optimization and equivariant image denoisers, which can massively accelerate the PnP algorithms with improved practicality. We provide theoretical analysis for recovery guarantees of instances of the proposed framework. Our numerical experiments on natural image processing and tomographic image reconstruction demonstrate the remarkable effectiveness of our sketched IDR schemes. △ Less

Submitted 5 December, 2024; v1 submitted 31 August, 2022; originally announced August 2022.

arXiv:2208.01631 [pdf, ps, other]

Stochastic Primal-Dual Three Operator Splitting Algorithm with Extension to Equivariant Regularization-by-Denoising

Authors: Junqi Tang, Matthias Ehrhardt, Carola-Bibiane Schönlieb

Abstract: In this work we propose a stochastic primal-dual three-operator splitting algorithm (TOS-SPDHG) for solving a class of convex three-composite optimization problems. Our proposed scheme is a direct three-operator splitting extension of the SPDHG algorithm [Chambolle et al. 2018]. We provide theoretical convergence analysis showing ergodic $O(1/K)$ convergence rate, and demonstrate the effectiveness… ▽ More In this work we propose a stochastic primal-dual three-operator splitting algorithm (TOS-SPDHG) for solving a class of convex three-composite optimization problems. Our proposed scheme is a direct three-operator splitting extension of the SPDHG algorithm [Chambolle et al. 2018]. We provide theoretical convergence analysis showing ergodic $O(1/K)$ convergence rate, and demonstrate the effectiveness of our approach in imaging inverse problems. Moreover, we further propose TOS-SPDHG-RED and TOS-SPDHG-eRED which utilizes the regularization-by-denoising (RED) framework to leverage pretrained deep denoising networks as priors. △ Less

Submitted 15 March, 2025; v1 submitted 2 August, 2022; originally announced August 2022.

Comments: SSVM-2025

arXiv:2206.06733 [pdf, other]

Data-Driven Mirror Descent with Input-Convex Neural Networks

Authors: Hong Ye Tan, Subhadip Mukherjee, Junqi Tang, Carola-Bibiane Schönlieb

Abstract: Learning-to-optimize is an emerging framework that seeks to speed up the solution of certain optimization problems by leveraging training data. Learned optimization solvers have been shown to outperform classical optimization algorithms in terms of convergence speed, especially for convex problems. Many existing data-driven optimization methods are based on parameterizing the update step and learn… ▽ More Learning-to-optimize is an emerging framework that seeks to speed up the solution of certain optimization problems by leveraging training data. Learned optimization solvers have been shown to outperform classical optimization algorithms in terms of convergence speed, especially for convex problems. Many existing data-driven optimization methods are based on parameterizing the update step and learning the optimal parameters (typically scalars) from the available data. We propose a novel functional parameterization approach for learned convex optimization solvers based on the classical mirror descent (MD) algorithm. Specifically, we seek to learn the optimal Bregman distance in MD by modeling the underlying convex function using an input-convex neural network (ICNN). The parameters of the ICNN are learned by minimizing the target objective function evaluated at the MD iterate after a predetermined number of iterations. The inverse of the mirror map is modeled approximately using another neural network, as the exact inverse is intractable to compute. We derive convergence rate bounds for the proposed learned mirror descent (LMD) approach with an approximate inverse mirror map and perform extensive numerical evaluation on various convex problems such as image inpainting, denoising, learning a two-class support vector machine (SVM) classifier and a multi-class linear classifier on fixed features. △ Less

Submitted 24 February, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

MSC Class: 46N10 (Primary) 65K10; 65G50 (Secondary)

arXiv:2203.11156 [pdf, other]

Operator Sketching for Deep Unrolling Networks

Authors: Junqi Tang, Subhadip Mukherjee, Carola-Bibiane Schönlieb

Abstract: In this work we propose a new paradigm for designing efficient deep unrolling networks using operator sketching. The deep unrolling networks are currently the state-of-the-art solutions for imaging inverse problems. However, for high-dimensional imaging tasks, especially the 3D cone-beam X-ray CT and 4D MRI imaging, the deep unrolling schemes typically become inefficient both in terms of memory an… ▽ More In this work we propose a new paradigm for designing efficient deep unrolling networks using operator sketching. The deep unrolling networks are currently the state-of-the-art solutions for imaging inverse problems. However, for high-dimensional imaging tasks, especially the 3D cone-beam X-ray CT and 4D MRI imaging, the deep unrolling schemes typically become inefficient both in terms of memory and computation, due to the need of computing multiple times the high-dimensional forward and adjoint operators. Recently researchers have found that such limitations can be partially addressed by stochastic unrolling with subsets of operators, inspired by the success of stochastic first-order optimization. In this work, we propose a further acceleration upon stochastic unrolling, using sketching techniques to approximate products in the high-dimensional image space. The operator sketching can be jointly applied with stochastic unrolling for the best acceleration and compression performance. Our numerical experiments on X-ray CT image reconstruction demonstrate the remarkable effectiveness of our sketched unrolling schemes. △ Less

Submitted 6 June, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

arXiv:2203.04650 [pdf, ps, other]

Gaussian random fields on non-separable Banach spaces

Authors: Yury Korolev, Jonas Latz, Carola-Bibiane Schönlieb

Abstract: We study Gaussian random fields on certain Banach spaces and investigate conditions for their existence. Our results apply inter alia to spaces of Radon measures and Hölder functions. In the former case, we are able to define Gaussian white noise on the space of measures directly, avoiding, e.g., an embedding into a negative-order Sobolev space. In the latter case, we demonstrate how Hölder regula… ▽ More We study Gaussian random fields on certain Banach spaces and investigate conditions for their existence. Our results apply inter alia to spaces of Radon measures and Hölder functions. In the former case, we are able to define Gaussian white noise on the space of measures directly, avoiding, e.g., an embedding into a negative-order Sobolev space. In the latter case, we demonstrate how Hölder regularity of the samples is controlled by that of the covariance kernel and, thus, show a connection to the Theorem of Kolmogorov-Chentsov. △ Less

Submitted 9 March, 2022; originally announced March 2022.

MSC Class: 60G15; 46N30; 46B26

arXiv:2202.04965 [pdf, other]

$Γ$-Convergence of an Ambrosio-Tortorelli approximation scheme for image segmentation

Authors: Irene Fonseca, Lisa Maria Kreusser, Carola-Bibiane Schönlieb, Matthew Thorpe

Abstract: Given an image $u_0$, the aim of minimising the Mumford-Shah functional is to find a decomposition of the image domain into sub-domains and a piecewise smooth approximation $u$ of $u_0$ such that $u$ varies smoothly within each sub-domain. Since the Mumford-Shah functional is highly non-smooth, regularizations such as the Ambrosio-Tortorelli approximation can be considered which is one of the most… ▽ More Given an image $u_0$, the aim of minimising the Mumford-Shah functional is to find a decomposition of the image domain into sub-domains and a piecewise smooth approximation $u$ of $u_0$ such that $u$ varies smoothly within each sub-domain. Since the Mumford-Shah functional is highly non-smooth, regularizations such as the Ambrosio-Tortorelli approximation can be considered which is one of the most computationally efficient approximations of the Mumford-Shah functional for image segmentation. While very impressive numerical results have been achieved in a large range of applications when minimising the functional, no analytical results are currently available for minimizers of the functional in the piecewise smooth setting, and this is the goal of this work. Our main result is the $Γ$-convergence of the Ambrosio-Tortorelli approximation of the Mumford-Shah functional for piecewise smooth approximations. This requires the introduction of an appropriate function space. As a consequence of our $Γ$-convergence result, we can infer the convergence of minimizers of the respective functionals. △ Less

Submitted 4 September, 2023; v1 submitted 10 February, 2022; originally announced February 2022.

arXiv:2112.03754 [pdf, other]

A Continuous-time Stochastic Gradient Descent Method for Continuous Data

Authors: Kexin Jin, Jonas Latz, Chenguang Liu, Carola-Bibiane Schönlieb

Abstract: Optimization problems with continuous data appear in, e.g., robust machine learning, functional data analysis, and variational inference. Here, the target function is given as an integral over a family of (continuously) indexed target functions - integrated with respect to a probability measure. Such problems can often be solved by stochastic optimization methods: performing optimization steps wit… ▽ More Optimization problems with continuous data appear in, e.g., robust machine learning, functional data analysis, and variational inference. Here, the target function is given as an integral over a family of (continuously) indexed target functions - integrated with respect to a probability measure. Such problems can often be solved by stochastic optimization methods: performing optimization steps with respect to the indexed target function with randomly switched indices. In this work, we study a continuous-time variant of the stochastic gradient descent algorithm for optimization problems with continuous data. This so-called stochastic gradient process consists in a gradient flow minimizing an indexed target function that is coupled with a continuous-time index process determining the index. Index processes are, e.g., reflected diffusions, pure jump processes, or other Lévy processes on compact spaces. Thus, we study multiple sampling patterns for the continuous data space and allow for data simulated or streamed at runtime of the algorithm. We analyze the approximation properties of the stochastic gradient process and study its longtime behavior and ergodicity under constant and decreasing learning rates. We end with illustrating the applicability of the stochastic gradient process in a polynomial regression problem with noisy functional data, as well as in a physics-informed neural network. △ Less

Submitted 7 December, 2021; originally announced December 2021.

Journal ref: Journal of Machine Learning Research 24(274), pp. 1-48, 2023

arXiv:2111.00534 [pdf, other]

Focal Attention Networks: optimising attention for biomedical image segmentation

Authors: Michael Yeung, Leonardo Rundo, Evis Sala, Carola-Bibiane Schönlieb, Guang Yang

Abstract: In recent years, there has been increasing interest to incorporate attention into deep learning architectures for biomedical image segmentation. The modular design of attention mechanisms enables flexible integration into convolutional neural network architectures, such as the U-Net. Whether attention is appropriate to use, what type of attention to use, and where in the network to incorporate att… ▽ More In recent years, there has been increasing interest to incorporate attention into deep learning architectures for biomedical image segmentation. The modular design of attention mechanisms enables flexible integration into convolutional neural network architectures, such as the U-Net. Whether attention is appropriate to use, what type of attention to use, and where in the network to incorporate attention modules, are all important considerations that are currently overlooked. In this paper, we investigate the role of the Focal parameter in modulating attention, revealing a link between attention in loss functions and networks. By incorporating a Focal distance penalty term, we extend the Unified Focal loss framework to include boundary-based losses. Furthermore, we develop a simple and interpretable, dataset and model-specific heuristic to integrate the Focal parameter into the Squeeze-and-Excitation block and Attention Gate, achieving optimal performance with fewer number of attention modules on three well-validated biomedical imaging datasets, suggesting judicious use of attention modules results in better performance and efficiency. △ Less

Submitted 31 October, 2021; originally announced November 2021.

arXiv:2111.00533 [pdf, other]

Incorporating Boundary Uncertainty into loss functions for biomedical image segmentation

Authors: Michael Yeung, Guang Yang, Evis Sala, Carola-Bibiane Schönlieb, Leonardo Rundo

Abstract: Manual segmentation is used as the gold-standard for evaluating neural networks on automated image segmentation tasks. Due to considerable heterogeneity in shapes, colours and textures, demarcating object boundaries is particularly difficult in biomedical images, resulting in significant inter and intra-rater variability. Approaches, such as soft labelling and distance penalty term, apply a global… ▽ More Manual segmentation is used as the gold-standard for evaluating neural networks on automated image segmentation tasks. Due to considerable heterogeneity in shapes, colours and textures, demarcating object boundaries is particularly difficult in biomedical images, resulting in significant inter and intra-rater variability. Approaches, such as soft labelling and distance penalty term, apply a global transformation to the ground truth, redefining the loss function with respect to uncertainty. However, global operations are computationally expensive, and neither approach accurately reflects the uncertainty underlying manual annotation. In this paper, we propose the Boundary Uncertainty, which uses morphological operations to restrict soft labelling to object boundaries, providing an appropriate representation of uncertainty in ground truth labels, and may be adapted to enable robust model training where systematic manual segmentation errors are present. We incorporate Boundary Uncertainty with the Dice loss, achieving consistently improved performance across three well-validated biomedical imaging datasets compared to soft labelling and distance-weighted penalty. Boundary Uncertainty not only more accurately reflects the segmentation process, but it is also efficient, robust to segmentation errors and exhibits better generalisation. △ Less

Submitted 31 October, 2021; originally announced November 2021.

arXiv:2111.00528 [pdf, other]

Calibrating the Dice loss to handle neural network overconfidence for biomedical image segmentation

Authors: Michael Yeung, Leonardo Rundo, Yang Nan, Evis Sala, Carola-Bibiane Schönlieb, Guang Yang

Abstract: The Dice similarity coefficient (DSC) is both a widely used metric and loss function for biomedical image segmentation due to its robustness to class imbalance. However, it is well known that the DSC loss is poorly calibrated, resulting in overconfident predictions that cannot be usefully interpreted in biomedical and clinical practice. Performance is often the only metric used to evaluate segment… ▽ More The Dice similarity coefficient (DSC) is both a widely used metric and loss function for biomedical image segmentation due to its robustness to class imbalance. However, it is well known that the DSC loss is poorly calibrated, resulting in overconfident predictions that cannot be usefully interpreted in biomedical and clinical practice. Performance is often the only metric used to evaluate segmentations produced by deep neural networks, and calibration is often neglected. However, calibration is important for translation into biomedical and clinical practice, providing crucial contextual information to model predictions for interpretation by scientists and clinicians. In this study, we provide a simple yet effective extension of the DSC loss, named the DSC++ loss, that selectively modulates the penalty associated with overconfident, incorrect predictions. As a standalone loss function, the DSC++ loss achieves significantly improved calibration over the conventional DSC loss across six well-validated open-source biomedical imaging datasets, including both 2D binary and 3D multi-class segmentation tasks. Similarly, we observe significantly improved calibration when integrating the DSC++ loss into four DSC-based loss functions. Finally, we use softmax thresholding to illustrate that well calibrated outputs enable tailoring of recall-precision bias, which is an important post-processing technique to adapt the model predictions to suit the biomedical or clinical task. The DSC++ loss overcomes the major limitation of the DSC loss, providing a suitable loss function for training deep learning segmentation models for use in biomedical and clinical practice. Source code is available at: https://github.com/mlyg/DicePlusPlus. △ Less

Submitted 1 November, 2022; v1 submitted 31 October, 2021; originally announced November 2021.

arXiv:2110.10093 [pdf, other]

Stochastic Primal-Dual Deep Unrolling

Authors: Junqi Tang, Subhadip Mukherjee, Carola-Bibiane Schönlieb

Abstract: We propose a new type of efficient deep-unrolling networks for solving imaging inverse problems. Conventional deep-unrolling methods require full forward operator and its adjoint across each layer, and hence can be significantly more expensive computationally as compared with other end-to-end methods that are based on post-processing of model-based reconstructions, especially for 3D image reconstr… ▽ More We propose a new type of efficient deep-unrolling networks for solving imaging inverse problems. Conventional deep-unrolling methods require full forward operator and its adjoint across each layer, and hence can be significantly more expensive computationally as compared with other end-to-end methods that are based on post-processing of model-based reconstructions, especially for 3D image reconstruction tasks. We develop a stochastic (ordered-subsets) variant of the classical learned primal-dual (LPD), which is a state-of-the-art unrolling network for tomographic image reconstruction. The proposed learned stochastic primal-dual (LSPD) network only uses subsets of the forward and adjoint operators and offers considerable computational efficiency. We provide theoretical analysis of a special case of our LSPD framework, suggesting that it has the potential to achieve image reconstruction quality competitive with the full-batch LPD while requiring only a fraction of the computation. The numerical results for two different X-ray computed tomography (CT) imaging tasks (namely, low-dose and sparse-view CT) corroborate this theoretical finding, demonstrating the promise of LSPD networks for large-scale imaging problems. △ Less

Submitted 15 February, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

Showing 1–50 of 132 results for author: Schönlieb, C