Skip to main content

Showing 1–25 of 25 results for author: Katsoulakis, M A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.13499  [pdf, ps, other

    cs.LG cs.AI math.OC

    Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency

    Authors: Kelvin Kan, Xingjian Li, Benjamin J. Zhang, Tuhin Sahai, Stanley Osher, Markos A. Katsoulakis

    Abstract: We study Transformers through the perspective of optimal control theory, using tools from continuous-time formulations to derive actionable insights into training and architecture design. This framework improves the performance of existing Transformer models while providing desirable theoretical guarantees, including generalization and robustness. Our framework is designed to be plug-and-play, ena… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2410.01244  [pdf, other

    stat.ML cs.LG

    Equivariant score-based generative models provably learn distributions with symmetries efficiently

    Authors: Ziyu Chen, Markos A. Katsoulakis, Benjamin J. Zhang

    Abstract: Symmetry is ubiquitous in many real-world phenomena and tasks, such as physics, images, and molecular simulations. Empirical studies have demonstrated that incorporating symmetries into generative models can provide better generalization and sampling efficiency when the underlying data distribution has group symmetry. In this work, we provide the first theoretical analysis and guarantees of score-… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  3. arXiv:2407.11901  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Combining Wasserstein-1 and Wasserstein-2 proximals: robust manifold learning via well-posed generative flows

    Authors: Hyemin Gu, Markos A. Katsoulakis, Luc Rey-Bellet, Benjamin J. Zhang

    Abstract: We formulate well-posed continuous-time generative flows for learning distributions that are supported on low-dimensional manifolds through Wasserstein proximal regularizations of $f$-divergences. Wasserstein-1 proximal operators regularize $f$-divergences so that singular distributions can be compared. Meanwhile, Wasserstein-2 proximal operators regularize the paths of the generative flows by add… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  4. arXiv:2405.15754  [pdf, ps, other

    stat.ML cs.LG math.ST

    Score-based generative models are provably robust: an uncertainty quantification perspective

    Authors: Nikiforos Mimikos-Stamatopoulos, Benjamin J. Zhang, Markos A. Katsoulakis

    Abstract: Through an uncertainty quantification (UQ) perspective, we show that score-based generative models (SGMs) are provably robust to the multiple sources of error in practical implementation. Our primary tool is the Wasserstein uncertainty propagation (WUP) theorem, a model-form UQ bound that describes how the $L^2$ error from learning the score function propagates to a Wasserstein-1 ($\mathbf{d}_1$)… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  5. arXiv:2405.15625  [pdf, ps, other

    stat.ML cs.LG

    Nonlinear denoising score matching for enhanced learning of structured distributions

    Authors: Jeremiah Birrell, Markos A. Katsoulakis, Luc Rey-Bellet, Benjamin J. Zhang, Wei Zhu

    Abstract: We present a novel method for training score-based generative models which uses nonlinear noising dynamics to improve learning of structured distributions. Generalizing to a nonlinear drift allows for additional structure to be incorporated into the dynamics, thus making the training better adapted to the data, e.g., in the case of multimodality or (approximate) symmetries. Such structure can be o… ▽ More

    Submitted 8 July, 2025; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 16 pages, 8 figures

  6. arXiv:2405.13962  [pdf, other

    stat.ML cs.LG

    Robust Generative Learning with Lipschitz-Regularized $α$-Divergences Allows Minimal Assumptions on Target Distributions

    Authors: Ziyu Chen, Hyemin Gu, Markos A. Katsoulakis, Luc Rey-Bellet, Wei Zhu

    Abstract: This paper demonstrates the robustness of Lipschitz-regularized $α$-divergences as objective functionals in generative modeling, showing they enable stable learning across a wide range of target distributions with minimal assumptions. We establish that these divergences remain finite under a mild condition-that the source distribution has a finite first moment-regardless of the properties of the t… ▽ More

    Submitted 23 November, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: 43 pages, 6 figures and 2 tables in the main text

  7. arXiv:2402.06162  [pdf, other

    stat.ML cs.LG

    Wasserstein proximal operators describe score-based generative models and resolve memorization

    Authors: Benjamin J. Zhang, Siting Liu, Wuchen Li, Markos A. Katsoulakis, Stanley J. Osher

    Abstract: We focus on the fundamental mathematical structure of score-based generative models (SGMs). We first formulate SGMs in terms of the Wasserstein proximal operator (WPO) and demonstrate that, via mean-field games (MFGs), the WPO formulation reveals mathematical structure that describes the inductive bias of diffusion and score-based models. In particular, MFGs yield optimality conditions in the form… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  8. arXiv:2305.13517  [pdf, other

    stat.ML cs.LG

    Statistical Guarantees of Group-Invariant GANs

    Authors: Ziyu Chen, Markos A. Katsoulakis, Luc Rey-Bellet, Wei Zhu

    Abstract: This work presents the first statistical performance guarantees for group-invariant generative models. Many real data, such as images and molecules, are invariant to certain group symmetries, which can be taken advantage of to learn more efficiently as we rigorously demonstrate in this work. Here we specifically study generative adversarial networks (GANs), and quantify the gains when incorporatin… ▽ More

    Submitted 10 March, 2025; v1 submitted 22 May, 2023; originally announced May 2023.

    MSC Class: 62E10; 62E17; 60-08

  9. arXiv:2304.13534  [pdf, other

    stat.ML cs.LG

    A mean-field games laboratory for generative modeling

    Authors: Benjamin J. Zhang, Markos A. Katsoulakis

    Abstract: We demonstrate the versatility of mean-field games (MFGs) as a mathematical framework for explaining, enhancing, and designing generative models. In generative flows, a Lagrangian formulation is used where each particle (generated sample) aims to minimize a loss function over its simulated path. The loss, however, is dependent on the paths of other particles, which leads to a competition among the… ▽ More

    Submitted 24 October, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

    Comments: 56 pages, 10 figures. Version 5 has a slightly modified version of the normalizing flow and improved introduction and conclusions

  10. arXiv:2210.17230  [pdf, other

    stat.ML cs.LG

    Lipschitz-regularized gradient flows and generative particle algorithms for high-dimensional scarce data

    Authors: Hyemin Gu, Panagiota Birmpa, Yannis Pantazis, Luc Rey-Bellet, Markos A. Katsoulakis

    Abstract: We build a new class of generative algorithms capable of efficiently learning an arbitrary target distribution from possibly scarce, high-dimensional data and subsequently generate new samples. These generative algorithms are particle-based and are constructed as gradient flows of Lipschitz-regularized Kullback-Leibler or other $f$-divergences, where data from a source distribution can be stably t… ▽ More

    Submitted 27 August, 2024; v1 submitted 31 October, 2022; originally announced October 2022.

    MSC Class: 35Q84; 49Q22; 62B10; 65C35; 68T07; 94A17

  11. arXiv:2210.04974  [pdf, ps, other

    stat.ML cs.LG

    Function-space regularized Rényi divergences

    Authors: Jeremiah Birrell, Yannis Pantazis, Paul Dupuis, Markos A. Katsoulakis, Luc Rey-Bellet

    Abstract: We propose a new family of regularized Rényi divergences parametrized not only by the order $α$ but also by a variational function space. These new objects are defined by taking the infimal convolution of the standard Rényi divergence with the integral probability metric (IPM) associated with the chosen function space. We derive a novel dual variational representation that can be used to construct… ▽ More

    Submitted 14 February, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: 24 pages, 4 figures

  12. arXiv:2202.01129  [pdf, other

    cs.LG math.PR stat.ML

    Structure-preserving GANs

    Authors: Jeremiah Birrell, Markos A. Katsoulakis, Luc Rey-Bellet, Wei Zhu

    Abstract: Generative adversarial networks (GANs), a class of distribution-learning methods based on a two-player game between a generator and a discriminator, can generally be formulated as a minmax problem based on the variational representation of a divergence between the unknown and the generated distributions. We introduce structure-preserving GANs as a data-efficient framework for learning distribution… ▽ More

    Submitted 17 June, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: 39 pages, 16 figures

  13. arXiv:2107.08179  [pdf, other

    stat.ML cs.IT cs.LG math.PR

    Model Uncertainty and Correctability for Directed Graphical Models

    Authors: Panagiota Birmpa, Jinchao Feng, Markos A. Katsoulakis, Luc Rey-Bellet

    Abstract: Probabilistic graphical models are a fundamental tool in probabilistic modeling, machine learning and artificial intelligence. They allow us to integrate in a natural way expert knowledge, physical modeling, heterogeneous and correlated data and quantities of interest. For exactly this reason, multiple sources of model uncertainty are inherent within the modular structure of the graphical model. I… ▽ More

    Submitted 17 July, 2021; originally announced July 2021.

    MSC Class: 62H22; 62P30; 68T37; 80A30; 93B35; 94A17

  14. arXiv:2011.05953  [pdf, ps, other

    stat.ML cs.LG

    $(f,Γ)$-Divergences: Interpolating between $f$-Divergences and Integral Probability Metrics

    Authors: Jeremiah Birrell, Paul Dupuis, Markos A. Katsoulakis, Yannis Pantazis, Luc Rey-Bellet

    Abstract: We develop a rigorous and general framework for constructing information-theoretic divergences that subsume both $f$-divergences and integral probability metrics (IPMs), such as the $1$-Wasserstein distance. We prove under which assumptions these divergences, hereafter referred to as $(f,Γ)$-divergences, provide a notion of `distance' between probability measures and show that they can be expresse… ▽ More

    Submitted 15 September, 2021; v1 submitted 11 November, 2020; originally announced November 2020.

    Comments: 49 pages

  15. arXiv:2009.04570  [pdf, other

    cs.LG math.NA stat.ML

    Mutual Information for Explainable Deep Learning of Multiscale Systems

    Authors: Søren Taverniers, Eric J. Hall, Markos A. Katsoulakis, Daniel M. Tartakovsky

    Abstract: Timely completion of design cycles for complex systems ranging from consumer electronics to hypersonic vehicles relies on rapid simulation-based prototyping. The latter typically involves high-dimensional spaces of possibly correlated control variables (CVs) and quantities of interest (QoIs) with non-Gaussian and possibly multimodal distributions. We develop a model-agnostic, moment-independent gl… ▽ More

    Submitted 19 May, 2021; v1 submitted 7 September, 2020; originally announced September 2020.

    Comments: 27 pages, 8 figures. Added additional examples

    MSC Class: 93B35 (Primary) 68T07; 62R07 (Secondary)

  16. arXiv:2009.00038  [pdf, other

    stat.ML cs.IT cs.LG math.PR

    Uncertainty quantification for Markov Random Fields

    Authors: Panagiota Birmpa, Markos A. Katsoulakis

    Abstract: We present an information-based uncertainty quantification method for general Markov Random Fields. Markov Random Fields (MRF) are structured, probabilistic graphical models over undirected graphs, and provide a fundamental unifying modeling tool for statistical mechanics, probabilistic machine learning, and artificial intelligence. Typically MRFs are complex and high-dimensional with nodes and ed… ▽ More

    Submitted 17 July, 2021; v1 submitted 31 August, 2020; originally announced September 2020.

    MSC Class: 62H22; 82B20; 94A17

  17. arXiv:2007.03814  [pdf, ps, other

    stat.ML cs.IT cs.LG math.PR

    Variational Representations and Neural Network Estimation of Rényi Divergences

    Authors: Jeremiah Birrell, Paul Dupuis, Markos A. Katsoulakis, Luc Rey-Bellet, Jie Wang

    Abstract: We derive a new variational formula for the Rényi family of divergences, $R_α(Q\|P)$, between probability measures $Q$ and $P$. Our result generalizes the classical Donsker-Varadhan variational formula for the Kullback-Leibler divergence. We further show that this Rényi variational formula holds over a range of function spaces; this leads to a formula for the optimizer under very weak assumptions… ▽ More

    Submitted 20 July, 2021; v1 submitted 7 July, 2020; originally announced July 2020.

    Comments: 24 pages, 2 figures

    MSC Class: 94A17; 62B10; 62G05

  18. arXiv:2006.08781  [pdf, ps, other

    cs.LG cs.IT stat.ML

    Optimizing Variational Representations of Divergences and Accelerating their Statistical Estimation

    Authors: Jeremiah Birrell, Markos A. Katsoulakis, Yannis Pantazis

    Abstract: Variational representations of divergences and distances between high-dimensional probability distributions offer significant theoretical insights and practical advantages in numerous research areas. Recently, they have gained popularity in machine learning as a tractable and scalable approach for training probabilistic models and for statistically differentiating between data distributions. Their… ▽ More

    Submitted 23 March, 2022; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: 48 pages, 6 figures

  19. arXiv:1906.09282  [pdf, ps, other

    math.PR cs.IT

    Quantification of Model Uncertainty on Path-Space via Goal-Oriented Relative Entropy

    Authors: Jeremiah Birrell, Markos A. Katsoulakis, Luc Rey-Bellet

    Abstract: Quantifying the impact of parametric and model-form uncertainty on the predictions of stochastic models is a key challenge in many applications. Previous work has shown that the relative entropy rate is an effective tool for deriving path-space uncertainty quantification (UQ) bounds on ergodic averages. In this work we identify appropriate information-theoretic objects for a wider range of quantit… ▽ More

    Submitted 2 September, 2020; v1 submitted 21 June, 2019; originally announced June 2019.

    Comments: 35 pages, 10 figures

    MSC Class: 62F35; 62B10; 60G40; 60J60; 93E20; 91G20

  20. arXiv:1706.10260  [pdf, other

    cs.IT math.PR

    How biased is your model? Concentration Inequalities, Information and Model Bias

    Authors: Konstantinos Gourgoulias, Markos A. Katsoulakis, Luc Rey-Bellet, Jie Wang

    Abstract: We derive tight and computable bounds on the bias of statistical estimators, or more generally of quantities of interest, when evaluated on a baseline model P rather than on the typically unknown true model Q. Our proposed method combines the scalable information inequality derived by P. Dupuis, K.Chowdhary, the authors and their collaborators together with classical concentration inequalities (su… ▽ More

    Submitted 30 June, 2017; originally announced June 2017.

  21. Scalable Information Inequalities for Uncertainty Quantification

    Authors: Markos A. Katsoulakis, Luc Rey-Bellet, Jie Wang

    Abstract: In this paper we demonstrate the only available scalable information bounds for quantities of interest of high dimensional probabilistic models. Scalability of inequalities allows us to (a) obtain uncertainty quantification bounds for quantities of interest in the large degree of freedom limit and/or at long time regimes; (b) assess the impact of large model perturbations as in nonlinear response… ▽ More

    Submitted 13 May, 2016; originally announced May 2016.

  22. Information Criteria for quantifying loss of reversibility in parallelized KMC

    Authors: Konstantinos Gourgoulias, Markos A. Katsoulakis, Luc Rey-Bellet

    Abstract: Parallel Kinetic Monte Carlo (KMC) is a potent tool to simulate stochastic particle systems efficiently. However, despite literature on quantifying domain decomposition errors of the particle system for this class of algorithms in the short and in the long time regime, no study yet explores and quantifies the loss of time-reversibility in Parallel KMC. Inspired by concepts from non-equilibrium sta… ▽ More

    Submitted 16 October, 2016; v1 submitted 8 May, 2016; originally announced May 2016.

    Comments: 29 pages

  23. arXiv:1412.6482  [pdf, other

    cs.IT physics.data-an

    Parametric Sensitivity Analysis for Stochastic Molecular Systems using Information Theoretic Metrics

    Authors: Anastasios Tsourtis, Yannis Pantazis, Markos A. Katsoulakis, Vagelis Harmandaris

    Abstract: In this paper we extend the parametric sensitivity analysis (SA) methodology proposed in Ref. [Y. Pantazis and M. A. Katsoulakis, J. Chem. Phys. 138, 054115 (2013)] to continuous time and continuous space Markov processes represented by stochastic differential equations and, particularly, stochastic molecular dynamics as described by the Langevin equation. The utilized SA method is based on the co… ▽ More

    Submitted 19 December, 2014; originally announced December 2014.

    Comments: 18 pages, Relative Entropy, Sensitivity Analysis, Fisher Information Matrix, Langevin dynamics, Methane Molecular Dynamics

  24. arXiv:1304.7700  [pdf, ps, other

    physics.comp-ph cs.IT physics.data-an

    Information-theoretic tools for parametrized coarse-graining of non-equilibrium extended systems

    Authors: Markos A. Katsoulakis, Petr Plechac

    Abstract: In this paper we focus on the development of new methods suitable for efficient and reliable coarse-graining of {\it non-equilibrium} molecular systems. In this context, we propose error estimation and controlled-fidelity model reduction methods based on Path-Space Information Theory, and combine it with statistical parametric estimation of rates for non-equilibrium stationary processes. The appro… ▽ More

    Submitted 30 July, 2013; v1 submitted 29 April, 2013; originally announced April 2013.

    Comments: 14 pages, 6 figures, expanded version v2 with additional benchmark

    MSC Class: 82-08; 82-C20; 82-C22; 82-C80

  25. arXiv:1304.3962  [pdf, ps, other

    cs.IT q-bio.MN

    Parametric Sensitivity Analysis for Biochemical Reaction Networks based on Pathwise Information Theory

    Authors: Yannis Pantazis, Markos A. Katsoulakis, Dionisios G. Vlachos

    Abstract: Stochastic modeling and simulation provide powerful predictive methods for the intrinsic understanding of fundamental mechanisms in complex biochemical networks. Typically, such mathematical models involve networks of coupled jump stochastic processes with a large number of parameters that need to be suitably calibrated against experimental data. In this direction, the parameter sensitivity analys… ▽ More

    Submitted 1 August, 2013; v1 submitted 14 April, 2013; originally announced April 2013.