Search | arXiv e-print repository

Physics-Constrained Flow Matching: Sampling Generative Models with Hard Constraints

Authors: Utkarsh Utkarsh, Pengfei Cai, Alan Edelman, Rafael Gomez-Bombarelli, Christopher Vincent Rackauckas

Abstract: Deep generative models have recently been applied to physical systems governed by partial differential equations (PDEs), offering scalable simulation and uncertainty-aware inference. However, enforcing physical constraints, such as conservation laws (linear and nonlinear) and physical consistencies, remains challenging. Existing methods often rely on soft penalties or architectural biases that fai… ▽ More Deep generative models have recently been applied to physical systems governed by partial differential equations (PDEs), offering scalable simulation and uncertainty-aware inference. However, enforcing physical constraints, such as conservation laws (linear and nonlinear) and physical consistencies, remains challenging. Existing methods often rely on soft penalties or architectural biases that fail to guarantee hard constraints. In this work, we propose Physics-Constrained Flow Matching (PCFM), a zero-shot inference framework that enforces arbitrary nonlinear constraints in pretrained flow-based generative models. PCFM continuously guides the sampling process through physics-based corrections applied to intermediate solution states, while remaining aligned with the learned flow and satisfying physical constraints. Empirically, PCFM outperforms both unconstrained and constrained baselines on a range of PDEs, including those with shocks, discontinuities, and sharp features, while ensuring exact constraint satisfaction at the final solution. Our method provides a general framework for enforcing hard constraints in both scientific and general-purpose generative models, especially in applications where constraint satisfaction is essential. △ Less

Submitted 4 June, 2025; originally announced June 2025.

Comments: 27 pages, 9 figures, 4 tables

arXiv:2505.20515 [pdf, ps, other]

Semi-Explicit Neural DAEs: Learning Long-Horizon Dynamical Systems with Algebraic Constraints

Authors: Avik Pal, Alan Edelman, Christopher Rackauckas

Abstract: Despite the promise of scientific machine learning (SciML) in combining data-driven techniques with mechanistic modeling, existing approaches for incorporating hard constraints in neural differential equations (NDEs) face significant limitations. Scalability issues and poor numerical properties prevent these neural models from being used for modeling physical systems with complicated conservation… ▽ More Despite the promise of scientific machine learning (SciML) in combining data-driven techniques with mechanistic modeling, existing approaches for incorporating hard constraints in neural differential equations (NDEs) face significant limitations. Scalability issues and poor numerical properties prevent these neural models from being used for modeling physical systems with complicated conservation laws. We propose Manifold-Projected Neural ODEs (PNODEs), a method that explicitly enforces algebraic constraints by projecting each ODE step onto the constraint manifold. This framework arises naturally from semi-explicit differential-algebraic equations (DAEs), and includes both a robust iterative variant and a fast approximation requiring a single Jacobian factorization. We further demonstrate that prior works on relaxation methods are special cases of our approach. PNODEs consistently outperform baselines across six benchmark problems achieving a mean constraint violation error below $10^{-10}$. Additionally, PNODEs consistently achieve lower runtime compared to other methods for a given level of error tolerance. These results show that constraint projection offers a simple strategy for learning physically consistent long-horizon dynamics. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2504.13821 [pdf, other]

Toward Portable GPU Performance: Julia Recursive Implementation of TRMM and TRSM

Authors: Vicki Carrica, Maxwell Onyango, Rabab Alomairy, Evelyne Ringoot, James Schloss, Alan Edelman

Abstract: This paper presents a performant and portable recursive implementation of triangular matrix-matrix multiplication (TRMM) and triangular solve (TRSM) in Julia for GPUs, two kernels that underlie many linear-algebra algorithms. We restructure TRMM and TRSM so that most work is executed as general matrix-matrix multiplication (GEMM), improving use of the GPU memory hierarchy and reducing latency. Exp… ▽ More This paper presents a performant and portable recursive implementation of triangular matrix-matrix multiplication (TRMM) and triangular solve (TRSM) in Julia for GPUs, two kernels that underlie many linear-algebra algorithms. We restructure TRMM and TRSM so that most work is executed as general matrix-matrix multiplication (GEMM), improving use of the GPU memory hierarchy and reducing latency. Exploiting Julia's multiple dispatch and metaprogramming together with the GPUArrays and KernelAbstractions frameworks, we expose a single hardware-agnostic API that runs on NVIDIA, AMD, and Apple Silicon GPUs. For large matrices the recursive code reaches throughput comparable to vendor libraries such as cuBLAS and rocBLAS, while providing these routines on Apple Silicon for the first time. The entire implementation is only a few hundred lines of code, showing that unified Julia programs can deliver near-vendor performance across heterogeneous architectures. △ Less

Submitted 18 April, 2025; originally announced April 2025.

arXiv:2501.14787 [pdf, other]

Matrix Calculus (for Machine Learning and Beyond)

Authors: Paige Bright, Alan Edelman, Steven G. Johnson

Abstract: This course, intended for undergraduates familiar with elementary calculus and linear algebra, introduces the extension of differential calculus to functions on more general vector spaces, such as functions that take as input a matrix and return a matrix inverse or factorization, derivatives of ODE solutions, and even stochastic derivatives of random functions. It emphasizes practical computatio… ▽ More This course, intended for undergraduates familiar with elementary calculus and linear algebra, introduces the extension of differential calculus to functions on more general vector spaces, such as functions that take as input a matrix and return a matrix inverse or factorization, derivatives of ODE solutions, and even stochastic derivatives of random functions. It emphasizes practical computational applications, such as large-scale optimization and machine learning, where derivatives must be re-imagined in order to be propagated through complicated calculations. The class also discusses efficiency concerns leading to "adjoint" or "reverse-mode" differentiation (a.k.a. "backpropagation"), and gives a gentle introduction to modern automatic differentiation (AD) techniques. △ Less

Submitted 7 January, 2025; originally announced January 2025.

Comments: Lecture notes for the MIT short course 18.063 "Matrix Calculus", based on the class as taught in January 2023 (also available on MIT OpenCourseWare)

arXiv:2309.06662 [pdf, other]

Oceananigans.jl: A Julia library that achieves breakthrough resolution, memory and energy efficiency in global ocean simulations

Authors: Simone Silvestri, Gregory L. Wagner, Christopher Hill, Matin Raayai Ardakani, Johannes Blaschke, Jean-Michel Campin, Valentin Churavy, Navid C. Constantinou, Alan Edelman, John Marshall, Ali Ramadhan, Andre Souza, Raffaele Ferrari

Abstract: Climate models must simulate hundreds of future scenarios for hundreds of years at coarse resolutions, and a handful of high-resolution decadal simulations to resolve localized extreme events. Using Oceananigans.jl, written from scratch in Julia, we report several achievements: First, a global ocean simulation with breakthrough horizontal resolution -- 488m -- reaching 15 simulated days per day (0… ▽ More Climate models must simulate hundreds of future scenarios for hundreds of years at coarse resolutions, and a handful of high-resolution decadal simulations to resolve localized extreme events. Using Oceananigans.jl, written from scratch in Julia, we report several achievements: First, a global ocean simulation with breakthrough horizontal resolution -- 488m -- reaching 15 simulated days per day (0.04 simulated years per day; SYPD). Second, Oceananigans simulates the global ocean at 488m with breakthrough memory efficiency on just 768 Nvidia A100 GPUs, a fraction of the resources available on current and upcoming exascale supercomputers. Third, and arguably most significant for climate modeling, Oceananigans achieves breakthrough energy efficiency reaching 0.95 SYPD at 1.7 km on 576 A100s and 9.9 SYPD at 10 km on 68 A100s -- the latter representing the highest horizontal resolutions employed by current IPCC-class ocean models. Routine climate simulations with 10 km ocean components are within reach. △ Less

Submitted 14 October, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

arXiv:2304.06835 [pdf, other]

doi 10.1016/j.cma.2023.116591

Automated Translation and Accelerated Solving of Differential Equations on Multiple GPU Platforms

Authors: Utkarsh Utkarsh, Valentin Churavy, Yingbo Ma, Tim Besard, Prakitr Srisuma, Tim Gymnich, Adam R. Gerlach, Alan Edelman, George Barbastathis, Richard D. Braatz, Christopher Rackauckas

Abstract: We demonstrate a high-performance vendor-agnostic method for massively parallel solving of ensembles of ordinary differential equations (ODEs) and stochastic differential equations (SDEs) on GPUs. The method is integrated with a widely used differential equation solver library in a high-level language (Julia's DifferentialEquations.jl) and enables GPU acceleration without requiring code changes by… ▽ More We demonstrate a high-performance vendor-agnostic method for massively parallel solving of ensembles of ordinary differential equations (ODEs) and stochastic differential equations (SDEs) on GPUs. The method is integrated with a widely used differential equation solver library in a high-level language (Julia's DifferentialEquations.jl) and enables GPU acceleration without requiring code changes by the user. Our approach achieves state-of-the-art performance compared to hand-optimized CUDA-C++ kernels while performing 20--100$\times$ faster than the vectorizing map (vmap) approach implemented in JAX and PyTorch. Performance evaluation on NVIDIA, AMD, Intel, and Apple GPUs demonstrates performance portability and vendor-agnosticism. We show composability with MPI to enable distributed multi-GPU workflows. The implemented solvers are fully featured -- supporting event handling, automatic differentiation, and incorporation of datasets via the GPU's texture memory -- allowing scientists to take advantage of GPU acceleration on all major current architectures without changing their model code and without loss of performance. We distribute the software as an open-source library https://github.com/SciML/DiffEqGPU.jl △ Less

Submitted 13 November, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

Comments: 14 figures

Journal ref: Computer Methods in Applied Mechanics and Engineering, Volume 419, 2024

arXiv:2303.15449 [pdf, other]

Backpropagation through Back Substitution with a Backslash

Authors: Alan Edelman, Ekin Akyurek, Yuyang Wang

Abstract: We present a linear algebra formulation of backpropagation which allows the calculation of gradients by using a generically written ``backslash'' or Gaussian elimination on triangular systems of equations. Generally, the matrix elements are operators. This paper has three contributions: (i) it is of intellectual value to replace traditional treatments of automatic differentiation with a (left acti… ▽ More We present a linear algebra formulation of backpropagation which allows the calculation of gradients by using a generically written ``backslash'' or Gaussian elimination on triangular systems of equations. Generally, the matrix elements are operators. This paper has three contributions: (i) it is of intellectual value to replace traditional treatments of automatic differentiation with a (left acting) operator theoretic, graph-based approach; (ii) operators can be readily placed in matrices in software in programming languages such as Julia as an implementation option; (iii) we introduce a novel notation, ``transpose dot'' operator ``$\{\}^{T_\bullet}$'' that allows for the reversal of operators. We further demonstrate the elegance of the operators approach in a suitable programming language consisting of generic linear algebra operators such as Julia \cite{bezanson2017julia}, and that it is possible to realize this abstraction in code. Our implementation shows how generic linear algebra can allow operators as elements of matrices. In contrast to ``operator overloading,'' where backslash would normally have to be rewritten to take advantage of operators, with ``generic programming'' there is no such need. △ Less

Submitted 30 August, 2023; v1 submitted 23 February, 2023; originally announced March 2023.

Comments: 22 pages

arXiv:2303.02262 [pdf, other]

Locally Regularized Neural Differential Equations: Some Black Boxes Were Meant to Remain Closed!

Authors: Avik Pal, Alan Edelman, Chris Rackauckas

Abstract: Implicit layer deep learning techniques, like Neural Differential Equations, have become an important modeling framework due to their ability to adapt to new problems automatically. Training a neural differential equation is effectively a search over a space of plausible dynamical systems. However, controlling the computational cost for these models is difficult since it relies on the number of st… ▽ More Implicit layer deep learning techniques, like Neural Differential Equations, have become an important modeling framework due to their ability to adapt to new problems automatically. Training a neural differential equation is effectively a search over a space of plausible dynamical systems. However, controlling the computational cost for these models is difficult since it relies on the number of steps the adaptive solver takes. Most prior works have used higher-order methods to reduce prediction timings while greatly increasing training time or reducing both training and prediction timings by relying on specific training algorithms, which are harder to use as a drop-in replacement due to strict requirements on automatic differentiation. In this manuscript, we use internal cost heuristics of adaptive differential equation solvers at stochastic time points to guide the training toward learning a dynamical system that is easier to integrate. We "close the black-box" and allow the use of our method with any adjoint technique for gradient calculations of the differential equation solution. We perform experimental studies to compare our method to global regularization to show that we attain similar performance numbers without compromising the flexibility of implementation on ordinary differential equations (ODEs) and stochastic differential equations (SDEs). We develop two sampling strategies to trade off between performance and training time. Our method reduces the number of function evaluations to 0.556-0.733x and accelerates predictions by 1.3-2x. △ Less

Submitted 2 June, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

Comments: Proceedings of the 40th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023

arXiv:2211.02740 [pdf, other]

Bridging HPC Communities through the Julia Programming Language

Authors: Valentin Churavy, William F Godoy, Carsten Bauer, Hendrik Ranocha, Michael Schlottke-Lakemper, Ludovic Räss, Johannes Blaschke, Mosè Giordano, Erik Schnetter, Samuel Omlin, Jeffrey S. Vetter, Alan Edelman

Abstract: The Julia programming language has evolved into a modern alternative to fill existing gaps in scientific computing and data science applications. Julia leverages a unified and coordinated single-language and ecosystem paradigm and has a proven track record of achieving high performance without sacrificing user productivity. These aspects make Julia a viable alternative to high-performance computin… ▽ More The Julia programming language has evolved into a modern alternative to fill existing gaps in scientific computing and data science applications. Julia leverages a unified and coordinated single-language and ecosystem paradigm and has a proven track record of achieving high performance without sacrificing user productivity. These aspects make Julia a viable alternative to high-performance computing's (HPC's) existing and increasingly costly many-body workflow composition strategy in which traditional HPC languages (e.g., Fortran, C, C++) are used for simulations, and higher-level languages (e.g., Python, R, MATLAB) are used for data analysis and interactive computing. Julia's rapid growth in language capabilities, package ecosystem, and community make it a promising universal language for HPC. This paper presents the views of a multidisciplinary group of researchers from academia, government, and industry that advocate for an HPC software development paradigm that emphasizes developer productivity, workflow portability, and low barriers for entry. We believe that the Julia programming language, its ecosystem, and its community provide modern and powerful capabilities that enable this group's objectives. Crucially, we believe that Julia can provide a feasible and less costly approach to programming scientific applications and workflows that target HPC facilities. In this work, we examine the current practice and role of Julia as a common, end-to-end programming model to address major challenges in scientific reproducibility, data-driven AI/machine learning, co-design and workflows, scalability and performance portability in heterogeneous computing, network communication, data management, and community education. As a result, the diversification of current investments to fulfill the needs of the upcoming decade is crucial as more supercomputing centers prepare for the exascale era. △ Less

Submitted 10 November, 2022; v1 submitted 4 November, 2022; originally announced November 2022.

Comments: 20 pages; improved image quality

arXiv:2201.12240 [pdf, other]

Continuous Deep Equilibrium Models: Training Neural ODEs faster by integrating them to Infinity

Authors: Avik Pal, Alan Edelman, Christopher Rackauckas

Abstract: Implicit models separate the definition of a layer from the description of its solution process. While implicit layers allow features such as depth to adapt to new scenarios and inputs automatically, this adaptivity makes its computational expense challenging to predict. In this manuscript, we increase the "implicitness" of the DEQ by redefining the method in terms of an infinite time neural ODE,… ▽ More Implicit models separate the definition of a layer from the description of its solution process. While implicit layers allow features such as depth to adapt to new scenarios and inputs automatically, this adaptivity makes its computational expense challenging to predict. In this manuscript, we increase the "implicitness" of the DEQ by redefining the method in terms of an infinite time neural ODE, which paradoxically decreases the training cost over a standard neural ODE by 2-4x. Additionally, we address the question: is there a way to simultaneously achieve the robustness of implicit layers while allowing the reduced computational expense of an explicit layer? To solve this, we develop Skip and Skip Reg. DEQ, an implicit-explicit (IMEX) layer that simultaneously trains an explicit prediction followed by an implicit correction. We show that training this explicit predictor is free and even decreases the training time by 1.11-3.19x. Together, this manuscript shows how bridging the dichotomy of implicit and explicit deep learning can combine the advantages of both techniques. △ Less

Submitted 3 March, 2023; v1 submitted 28 January, 2022; originally announced January 2022.

arXiv:2105.05946 [pdf, other]

Composing Modeling and Simulation with Machine Learning in Julia

Authors: Chris Rackauckas, Ranjan Anantharaman, Alan Edelman, Shashi Gowda, Maja Gwozdz, Anand Jain, Chris Laughman, Yingbo Ma, Francesco Martinuzzi, Avik Pal, Utkarsh Rajput, Elliot Saba, Viral B. Shah

Abstract: In this paper we introduce JuliaSim, a high-performance programming environment designed to blend traditional modeling and simulation with machine learning. JuliaSim can build accelerated surrogates from component-based models, such as those conforming to the FMI standard, using continuous-time echo state networks (CTESN). The foundation of this environment, ModelingToolkit.jl, is an acausal model… ▽ More In this paper we introduce JuliaSim, a high-performance programming environment designed to blend traditional modeling and simulation with machine learning. JuliaSim can build accelerated surrogates from component-based models, such as those conforming to the FMI standard, using continuous-time echo state networks (CTESN). The foundation of this environment, ModelingToolkit.jl, is an acausal modeling language which can compose the trained surrogates as components within its staged compilation process. As a complementary factor we present the JuliaSim model library, a standard library with differential-algebraic equations and pre-trained surrogates, which can be composed using the modeling system for design, optimization, and control. We demonstrate the effectiveness of the surrogate-accelerated modeling and simulation approach on HVAC dynamics by showing that the CTESN surrogates accurately capture the dynamics of a HVAC cycle at less than 4\% error while accelerating its simulation by 340x. We illustrate the use of surrogate acceleration in the design process via global optimization of simulation parameters using the embedded surrogate, yielding a speedup of two orders of magnitude to find the optimum. We showcase the surrogate deployed in a co-simulation loop, as a drop-in replacement for one of the coupled FMUs, allowing engineers to effectively explore the design space of a coupled system. Together this demonstrates a workflow for automating the integration of machine learning techniques into traditional modeling and simulation processes. △ Less

Submitted 12 May, 2021; originally announced May 2021.

arXiv:2105.03949 [pdf, other]

High-performance symbolic-numerics via multiple dispatch

Authors: Shashi Gowda, Yingbo Ma, Alessandro Cheli, Maja Gwozdz, Viral B. Shah, Alan Edelman, Christopher Rackauckas

Abstract: As mathematical computing becomes more democratized in high-level languages, high-performance symbolic-numeric systems are necessary for domain scientists and engineers to get the best performance out of their machine without deep knowledge of code optimization. Naturally, users need different term types either to have different algebraic properties for them, or to use efficient data structures. T… ▽ More As mathematical computing becomes more democratized in high-level languages, high-performance symbolic-numeric systems are necessary for domain scientists and engineers to get the best performance out of their machine without deep knowledge of code optimization. Naturally, users need different term types either to have different algebraic properties for them, or to use efficient data structures. To this end, we developed Symbolics.jl, an extendable symbolic system which uses dynamic multiple dispatch to change behavior depending on the domain needs. In this work we detail an underlying abstract term interface which allows for speed without sacrificing generality. We show that by formalizing a generic API on actions independent of implementation, we can retroactively add optimized data structures to our system without changing the pre-existing term rewriters. We showcase how this can be used to optimize term construction and give a 113x acceleration on general symbolic transformations. Further, we show that such a generic API allows for complementary term-rewriting implementations. We demonstrate the ability to swap between classical term-rewriting simplifiers and e-graph-based term-rewriting simplifiers. We showcase an e-graph ruleset which minimizes the number of CPU cycles during expression evaluation, and demonstrate how it simplifies a real-world reaction-network simulation to halve the runtime. Additionally, we show a reaction-diffusion partial differential equation solver which is able to be automatically converted into symbolic expressions via multiple dispatch tracing, which is subsequently accelerated and parallelized to give a 157x simulation speedup. Together, this presents Symbolics.jl as a next-generation symbolic-numeric computing environment geared towards modeling and simulation. △ Less

Submitted 5 February, 2022; v1 submitted 9 May, 2021; originally announced May 2021.

ACM Class: D.3.3; I.1.1; I.1.3

arXiv:2012.14201 [pdf]

StudyU: a platform for designing and conducting innovative digital N-of-1 trials

Authors: Stefan Konigorski, Sarah Wernicke, Tamara Slosarek, Alexander M. Zenner, Nils Strelow, Ferenc D. Ruether, Florian Henschel, Manisha Manaswini, Fabian Pottbäcker, Jonathan A. Edelman, Babajide Owoyele, Matteo Danieletto, Eddye Golden, Micol Zweig, Girish Nadkarni, Erwin Böttinger

Abstract: N-of-1 trials are the gold standard study design to evaluate individual treatment effects and derive personalized treatment strategies. Digital tools have the potential to initiate a new era of N-of-1 trials in terms of scale and scope, but fully-functional platforms are not yet available. Here, we present the open source StudyU platform which includes the StudyU designer and StudyU app. With the… ▽ More N-of-1 trials are the gold standard study design to evaluate individual treatment effects and derive personalized treatment strategies. Digital tools have the potential to initiate a new era of N-of-1 trials in terms of scale and scope, but fully-functional platforms are not yet available. Here, we present the open source StudyU platform which includes the StudyU designer and StudyU app. With the StudyU designer, scientists are given a collaborative web application to digitally specify, publish, and conduct N-of-1 trials. The StudyU app is a smartphone application with innovative user-centric elements for participants to partake in the published trials and assess the effects of different interventions on their health. Thereby, the StudyU platform allows clinicians and researchers worldwide to easily design and conduct digital N-of-1 trials in a safe manner. We envision that StudyU can change the landscape of personalized treatments both for patients and healthy individuals, democratize and personalize evidence generation for self-optimization and medicine, and can be integrated in clinical practice. △ Less

Submitted 12 July, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

Comments: Manuscript: 14 pages, 5 figures. Supplements: 23 pages, Supplementary Text 1-5, Supplementary Figures 1-5

arXiv:2011.04426 [pdf, other]

AutoMat: Accelerated Computational Electrochemical systems Discovery

Authors: Emil Annevelink, Rachel Kurchin, Eric Muckley, Lance Kavalsky, Vinay I. Hegde, Valentin Sulzer, Shang Zhu, Jiankun Pu, David Farina, Matthew Johnson, Dhairya Gandhi, Adarsh Dave, Hongyi Lin, Alan Edelman, Bharath Ramsundar, James Saal, Christopher Rackauckas, Viral Shah, Bryce Meredig, Venkatasubramanian Viswanathan

Abstract: Large-scale electrification is vital to addressing the climate crisis, but several scientific and technological challenges remain to fully electrify both the chemical industry and transportation. In both of these areas, new electrochemical materials will be critical, but their development currently relies heavily on human-time-intensive experimental trial and error and computationally expensive fi… ▽ More Large-scale electrification is vital to addressing the climate crisis, but several scientific and technological challenges remain to fully electrify both the chemical industry and transportation. In both of these areas, new electrochemical materials will be critical, but their development currently relies heavily on human-time-intensive experimental trial and error and computationally expensive first-principles, meso-scale and continuum simulations. We present an automated workflow, AutoMat, that accelerates these computational steps by introducing both automated input generation and management of simulations across scales from first principles to continuum device modeling. Furthermore, we show how to seamlessly integrate multi-fidelity predictions such as machine learning surrogates or automated robotic experiments "in-the-loop". The automated framework is implemented with design space search techniques to dramatically accelerate the overall materials discovery pipeline by implicitly learning design features that optimize device performance across several metrics. We discuss the benefits of AutoMat using examples in electrocatalysis and energy storage and highlight lessons learned. △ Less

Submitted 13 May, 2022; v1 submitted 3 November, 2020; originally announced November 2020.

Comments: v1-3:4 pages, 1 figure, accepted to NeurIPS Climate Change and AI Workshop 2020, updating acknowledgements and citations v4: substantially updated content and author list, accepted to MRS Bulletin

arXiv:2010.04004 [pdf, other]

Accelerating Simulation of Stiff Nonlinear Systems using Continuous-Time Echo State Networks

Authors: Ranjan Anantharaman, Yingbo Ma, Shashi Gowda, Chris Laughman, Viral Shah, Alan Edelman, Chris Rackauckas

Abstract: Modern design, control, and optimization often requires simulation of highly nonlinear models, leading to prohibitive computational costs. These costs can be amortized by evaluating a cheap surrogate of the full model. Here we present a general data-driven method, the continuous-time echo state network (CTESN), for generating surrogates of nonlinear ordinary differential equations with dynamics at… ▽ More Modern design, control, and optimization often requires simulation of highly nonlinear models, leading to prohibitive computational costs. These costs can be amortized by evaluating a cheap surrogate of the full model. Here we present a general data-driven method, the continuous-time echo state network (CTESN), for generating surrogates of nonlinear ordinary differential equations with dynamics at widely separated timescales. We empirically demonstrate near-constant time performance using our CTESNs on a physically motivated scalable model of a heating system whose full execution time increases exponentially, while maintaining relative error of within 0.2 %. We also show that our model captures fast transients as well as slow dynamics effectively, while other techniques such as physics informed neural networks have difficulties trying to train and predict the highly nonlinear behavior of these models. △ Less

Submitted 24 March, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

arXiv:2007.12158 [pdf, other]

Signal Enhancement for Magnetic Navigation Challenge Problem

Authors: Albert R. Gnadt, Joseph Belarge, Aaron Canciani, Glenn Carl, Lauren Conger, Joseph Curro, Alan Edelman, Peter Morales, Aaron P. Nielsen, Michael F. O'Keeffe, Christopher V. Rackauckas, Jonathan Taylor, Allan B. Wollaber

Abstract: Harnessing the magnetic field of the Earth for navigation has shown promise as a viable alternative to other navigation systems. A magnetic navigation system collects its own magnetic field data using a magnetometer and uses magnetic anomaly maps to determine the current location. The greatest challenge with magnetic navigation arises when the magnetic field measurements from the magnetometer enco… ▽ More Harnessing the magnetic field of the Earth for navigation has shown promise as a viable alternative to other navigation systems. A magnetic navigation system collects its own magnetic field data using a magnetometer and uses magnetic anomaly maps to determine the current location. The greatest challenge with magnetic navigation arises when the magnetic field measurements from the magnetometer encompass the magnetic field from not just the Earth, but also from the vehicle on which it is mounted. It is difficult to separate the Earth magnetic anomaly field, which is crucial for navigation, from the total magnetic field reading from the sensor. The purpose of this challenge problem is to decouple the Earth and aircraft magnetic signals in order to derive a clean signal from which to perform magnetic navigation. Baseline testing on the dataset has shown that the Earth magnetic field can be extracted from the total magnetic field using machine learning (ML). The challenge is to remove the aircraft magnetic field from the total magnetic field using a trained model. This challenge offers an opportunity to construct an effective model for removing the aircraft magnetic field from the dataset by using a scientific machine learning (SciML) approach comprised of an ML algorithm integrated with the physics of magnetic navigation. △ Less

Submitted 6 January, 2023; v1 submitted 23 July, 2020; originally announced July 2020.

Comments: 12 pages, 2 figures. See https://github.com/MIT-AI-Accelerator/MagNav.jl for accompanying data and code

arXiv:2001.04385 [pdf, other]

Universal Differential Equations for Scientific Machine Learning

Authors: Christopher Rackauckas, Yingbo Ma, Julius Martensen, Collin Warner, Kirill Zubov, Rohit Supekar, Dominic Skinner, Ali Ramadhan, Alan Edelman

Abstract: In the context of science, the well-known adage "a picture is worth a thousand words" might well be "a model is worth a thousand datasets." In this manuscript we introduce the SciML software ecosystem as a tool for mixing the information of physical laws and scientific models with data-driven machine learning approaches. We describe a mathematical object, which we denote universal differential equ… ▽ More In the context of science, the well-known adage "a picture is worth a thousand words" might well be "a model is worth a thousand datasets." In this manuscript we introduce the SciML software ecosystem as a tool for mixing the information of physical laws and scientific models with data-driven machine learning approaches. We describe a mathematical object, which we denote universal differential equations (UDEs), as the unifying framework connecting the ecosystem. We show how a wide variety of applications, from automatically discovering biological mechanisms to solving high-dimensional Hamilton-Jacobi-Bellman equations, can be phrased and efficiently handled through the UDE formalism and its tooling. We demonstrate the generality of the software tooling to handle stochasticity, delays, and implicit constraints. This funnels the wide variety of SciML applications into a core set of training mechanisms which are highly optimized, stabilized for stiff equations, and compatible with distributed parallelism and GPU accelerators. △ Less

Submitted 2 November, 2021; v1 submitted 13 January, 2020; originally announced January 2020.

Comments: 5 figures, 2 tables, 11 supplemental figures, 29 pages, 25 supplemental pages

arXiv:1907.07587 [pdf, other]

A Differentiable Programming System to Bridge Machine Learning and Scientific Computing

Authors: Mike Innes, Alan Edelman, Keno Fischer, Chris Rackauckas, Elliot Saba, Viral B Shah, Will Tebbutt

Abstract: Scientific computing is increasingly incorporating the advancements in machine learning and the ability to work with large amounts of data. At the same time, machine learning models are becoming increasingly sophisticated and exhibit many features often seen in scientific computing, stressing the capabilities of machine learning frameworks. Just as the disciplines of scientific computing and machi… ▽ More Scientific computing is increasingly incorporating the advancements in machine learning and the ability to work with large amounts of data. At the same time, machine learning models are becoming increasingly sophisticated and exhibit many features often seen in scientific computing, stressing the capabilities of machine learning frameworks. Just as the disciplines of scientific computing and machine learning have shared common underlying infrastructure in the form of numerical linear algebra, we now have the opportunity to further share new computational infrastructure, and thus ideas, in the form of Differentiable Programming. We describe Zygote, a Differentiable Programming system that is able to take gradients of general program structures. We implement this system in the Julia programming language. Our system supports almost all language constructs (control flow, recursion, mutation, etc.) and compiles high-performance code without requiring any user intervention or refactoring to stage computations. This enables an expressive programming model for deep learning, but more importantly, it enables us to incorporate a large ecosystem of libraries in our models in a straightforward way. We discuss our approach to automatic differentiation, including its support for advanced techniques such as mixed-mode, complex and checkpointed differentiation, and present several examples of differentiating programs. △ Less

Submitted 18 July, 2019; v1 submitted 17 July, 2019; originally announced July 2019.

Comments: Submitted to NeurIPS 2019

arXiv:1808.03374 [pdf, other]

Fast computation of the principal components of genotype matrices in Julia

Authors: Jiahao Chen, Andreas Noack, Alan Edelman

Abstract: Finding the largest few principal components of a matrix of genetic data is a common task in genome-wide association studies (GWASs), both for dimensionality reduction and for identifying unwanted factors of variation. We describe a simple random matrix model for matrices that arise in GWASs, showing that the singular values have a bulk behavior that obeys a Marchenko-Pastur distributed with a han… ▽ More Finding the largest few principal components of a matrix of genetic data is a common task in genome-wide association studies (GWASs), both for dimensionality reduction and for identifying unwanted factors of variation. We describe a simple random matrix model for matrices that arise in GWASs, showing that the singular values have a bulk behavior that obeys a Marchenko-Pastur distributed with a handful of large outliers. We also implement Golub-Kahan-Lanczos (GKL) bidiagonalization in the Julia programming language, providing thick restarting and a choice between full and partial reorthogonalization strategies to control numerical roundoff. Our implementation of GKL bidiagonalization is up to 36 times faster than software tools used commonly in genomics data analysis for computing principal components, such as EIGENSOFT and FlashPCA, which use dense LAPACK routines and randomized subspace iteration respectively. △ Less

Submitted 9 August, 2018; originally announced August 2018.

Comments: 15 pages, 6 figures, 3 tables, repository at https://github.com/jiahao/paper-copper-2016

MSC Class: 15A18 ACM Class: G.1.3

arXiv:1807.05308 [pdf, other]

doi 10.1109/HPEC.2018.8547577

TabulaROSA: Tabular Operating System Architecture for Massively Parallel Heterogeneous Compute Engines

Authors: Jeremy Kepner, Ron Brightwell, Alan Edelman, Vijay Gadepally, Hayden Jananthan, Michael Jones, Sam Madden, Peter Michaleas, Hamed Okhravi, Kevin Pedretti, Albert Reuther, Thomas Sterling, Mike Stonebraker

Abstract: The rise in computing hardware choices is driving a reevaluation of operating systems. The traditional role of an operating system controlling the execution of its own hardware is evolving toward a model whereby the controlling processor is distinct from the compute engines that are performing most of the computations. In this context, an operating system can be viewed as software that brokers and… ▽ More The rise in computing hardware choices is driving a reevaluation of operating systems. The traditional role of an operating system controlling the execution of its own hardware is evolving toward a model whereby the controlling processor is distinct from the compute engines that are performing most of the computations. In this context, an operating system can be viewed as software that brokers and tracks the resources of the compute engines and is akin to a database management system. To explore the idea of using a database in an operating system role, this work defines key operating system functions in terms of rigorous mathematical semantics (associative array algebra) that are directly translatable into database operations. These operations possess a number of mathematical properties that are ideal for parallel operating systems by guaranteeing correctness over a wide range of parallel operations. The resulting operating system equations provide a mathematical specification for a Tabular Operating System Architecture (TabulaROSA) that can be implemented on any platform. Simulations of forking in TabularROSA are performed using an associative array implementation and compared to Linux on a 32,000+ core supercomputer. Using over 262,000 forkers managing over 68,000,000,000 processes, the simulations show that TabulaROSA has the potential to perform operating system functions on a massively parallel scale. The TabulaROSA simulations show 20x higher performance as compared to Linux while managing 2000x more processes in fully searchable tables. △ Less

Submitted 13 July, 2018; originally announced July 2018.

Comments: 8 pages, 6 figures, accepted at IEEE HPEC 2018

arXiv:1612.08825 [pdf, other]

Accelerated Convolutions for Efficient Multi-Scale Time to Contact Computation in Julia

Authors: Alexander Amini, Berthold Horn, Alan Edelman

Abstract: Convolutions have long been regarded as fundamental to applied mathematics, physics and engineering. Their mathematical elegance allows for common tasks such as numerical differentiation to be computed efficiently on large data sets. Efficient computation of convolutions is critical to artificial intelligence in real-time applications, like machine vision, where convolutions must be continuously a… ▽ More Convolutions have long been regarded as fundamental to applied mathematics, physics and engineering. Their mathematical elegance allows for common tasks such as numerical differentiation to be computed efficiently on large data sets. Efficient computation of convolutions is critical to artificial intelligence in real-time applications, like machine vision, where convolutions must be continuously and efficiently computed on tens to hundreds of kilobytes per second. In this paper, we explore how convolutions are used in fundamental machine vision applications. We present an accelerated n-dimensional convolution package in the high performance computing language, Julia, and demonstrate its efficacy in solving the time to contact problem for machine vision. Results are measured against synthetically generated videos and quantitatively assessed according to their mean squared error from the ground truth. We achieve over an order of magnitude decrease in compute time and allocated memory for comparable machine vision applications. All code is packaged and integrated into the official Julia Package Manager to be used in various other scenarios. △ Less

Submitted 28 December, 2016; originally announced December 2016.

arXiv:1608.04041 [pdf, other]

doi 10.1109/HPEC.2016.7761626

Julia Implementation of the Dynamic Distributed Dimensional Data Model

Authors: Alexander Chen, Alan Edelman, Jeremy Kepner, Vijay Gadepally, Dylan Hutchison

Abstract: Julia is a new language for writing data analysis programs that are easy to implement and run at high performance. Similarly, the Dynamic Distributed Dimensional Data Model (D4M) aims to clarify data analysis operations while retaining strong performance. D4M accomplishes these goals through a composable, unified data model on associative arrays. In this work, we present an implementation of D4M i… ▽ More Julia is a new language for writing data analysis programs that are easy to implement and run at high performance. Similarly, the Dynamic Distributed Dimensional Data Model (D4M) aims to clarify data analysis operations while retaining strong performance. D4M accomplishes these goals through a composable, unified data model on associative arrays. In this work, we present an implementation of D4M in Julia and describe how it enables and facilitates data analysis. Several experiments showcase scalable performance in our new Julia version as compared to the original Matlab implementation. △ Less

Submitted 13 August, 2016; originally announced August 2016.

Comments: 7 pages, 16 figures, IEEE HPEC 2016

arXiv:1411.1607 [pdf, other]

Julia: A Fresh Approach to Numerical Computing

Authors: Jeff Bezanson, Alan Edelman, Stefan Karpinski, Viral B. Shah

Abstract: Bridging cultures that have often been distant, Julia combines expertise from the diverse fields of computer science and computational science to create a new approach to numerical computing. Julia is designed to be easy and fast. Julia questions notions generally held as "laws of nature" by practitioners of numerical computing: 1. High-level dynamic programs have to be slow. 2. One must proto… ▽ More Bridging cultures that have often been distant, Julia combines expertise from the diverse fields of computer science and computational science to create a new approach to numerical computing. Julia is designed to be easy and fast. Julia questions notions generally held as "laws of nature" by practitioners of numerical computing: 1. High-level dynamic programs have to be slow. 2. One must prototype in one language and then rewrite in another language for speed or deployment, and 3. There are parts of a system for the programmer, and other parts best left untouched as they are built by the experts. We introduce the Julia programming language and its design --- a dance between specialization and abstraction. Specialization allows for custom treatment. Multiple dispatch, a technique from computer science, picks the right algorithm for the right circumstance. Abstraction, what good computation is really about, recognizes what remains the same after differences are stripped away. Abstractions in mathematics are captured as code through another technique from computer science, generic programming. Julia shows that one can have machine performance without sacrificing human convenience. △ Less

Submitted 19 July, 2015; v1 submitted 6 November, 2014; originally announced November 2014.

Comments: 37 pages

arXiv:1410.6449 [pdf, other]

Parallel Prefix Polymorphism Permits Parallelization, Presentation & Proof

Authors: Jiahao Chen, Alan Edelman

Abstract: Polymorphism in programming languages enables code reuse. Here, we show that polymorphism has broad applicability far beyond computations for technical computing: parallelism in distributed computing, presentation of visualizations of runtime data flow, and proofs for formal verification of correctness. The ability to reuse a single codebase for all these purposes provides new ways to understand a… ▽ More Polymorphism in programming languages enables code reuse. Here, we show that polymorphism has broad applicability far beyond computations for technical computing: parallelism in distributed computing, presentation of visualizations of runtime data flow, and proofs for formal verification of correctness. The ability to reuse a single codebase for all these purposes provides new ways to understand and verify parallel programs. △ Less

Submitted 6 November, 2014; v1 submitted 23 October, 2014; originally announced October 2014.

Comments: 10 pages, 3 figures. Proceedings of HPTCDL, the 1st Workshop on High Performance Technical Computing in Dynamic Languages, November 17, 2014, New Orleans, Louisiana, USA. Supporting Information available at http://jiahao.github.io/parallel-prefix

ACM Class: D.1.3; D.3.2; G.1.0

arXiv:1407.3845 [pdf, other]

doi 10.1145/2627373.2627383

Array operators using multiple dispatch: a design methodology for array implementations in dynamic languages

Authors: Jeff Bezanson, Jiahao Chen, Stefan Karpinski, Viral Shah, Alan Edelman

Abstract: Arrays are such a rich and fundamental data type that they tend to be built into a language, either in the compiler or in a large low-level library. Defining this functionality at the user level instead provides greater flexibility for application domains not envisioned by the language designer. Only a few languages, such as C++ and Haskell, provide the necessary power to define $n$-dimensional ar… ▽ More Arrays are such a rich and fundamental data type that they tend to be built into a language, either in the compiler or in a large low-level library. Defining this functionality at the user level instead provides greater flexibility for application domains not envisioned by the language designer. Only a few languages, such as C++ and Haskell, provide the necessary power to define $n$-dimensional arrays, but these systems rely on compile-time abstraction, sacrificing some flexibility. In contrast, dynamic languages make it straightforward for the user to define any behavior they might want, but at the possible expense of performance. As part of the Julia language project, we have developed an approach that yields a novel trade-off between flexibility and compile-time analysis. The core abstraction we use is multiple dispatch. We have come to believe that while multiple dispatch has not been especially popular in most kinds of programming, technical computing is its killer application. By expressing key functions such as array indexing using multi-method signatures, a surprising range of behaviors can be obtained, in a way that is both relatively easy to write and amenable to compiler analysis. The compact factoring of concerns provided by these methods makes it easier for user-defined types to behave consistently with types in the standard library. △ Less

Submitted 14 July, 2014; originally announced July 2014.

Comments: 6 pages, 2 figures, workshop paper for the ARRAY '14 workshop, June 11, 2014, Edinburgh, United Kingdom

ACM Class: D.3.3

arXiv:1209.5145 [pdf, other]

Julia: A Fast Dynamic Language for Technical Computing

Authors: Jeff Bezanson, Stefan Karpinski, Viral B. Shah, Alan Edelman

Abstract: Dynamic languages have become popular for scientific computing. They are generally considered highly productive, but lacking in performance. This paper presents Julia, a new dynamic language for technical computing, designed for performance from the beginning by adapting and extending modern programming language techniques. A design based on generic functions and a rich type system simultaneously… ▽ More Dynamic languages have become popular for scientific computing. They are generally considered highly productive, but lacking in performance. This paper presents Julia, a new dynamic language for technical computing, designed for performance from the beginning by adapting and extending modern programming language techniques. A design based on generic functions and a rich type system simultaneously enables an expressive programming model and successful type inference, leading to good performance for a wide range of programs. This makes it possible for much of the Julia library to be written in Julia itself, while also incorporating best-of-breed C and Fortran libraries. △ Less

Submitted 23 September, 2012; originally announced September 2012.

ACM Class: D.3.2

arXiv:1106.4587 [pdf, ps, other]

An Efficient Partitioning Oracle for Bounded-Treewidth Graphs

Authors: Alan Edelman, Avinatan Hassidim, Huy N. Nguyen, Krzysztof Onak

Abstract: Partitioning oracles were introduced by Hassidim et al. (FOCS 2009) as a generic tool for constant-time algorithms. For any epsilon > 0, a partitioning oracle provides query access to a fixed partition of the input bounded-degree minor-free graph, in which every component has size poly(1/epsilon), and the number of edges removed is at most epsilon*n, where n is the number of vertices in the graph.… ▽ More Partitioning oracles were introduced by Hassidim et al. (FOCS 2009) as a generic tool for constant-time algorithms. For any epsilon > 0, a partitioning oracle provides query access to a fixed partition of the input bounded-degree minor-free graph, in which every component has size poly(1/epsilon), and the number of edges removed is at most epsilon*n, where n is the number of vertices in the graph. However, the oracle of Hassidimet al. makes an exponential number of queries to the input graph to answer every query about the partition. In this paper, we construct an efficient partitioning oracle for graphs with constant treewidth. The oracle makes only O(poly(1/epsilon)) queries to the input graph to answer each query about the partition. Examples of bounded-treewidth graph classes include k-outerplanar graphs for fixed k, series-parallel graphs, cactus graphs, and pseudoforests. Our oracle yields poly(1/epsilon)-time property testing algorithms for membership in these classes of graphs. Another application of the oracle is a poly(1/epsilon)-time algorithm that approximates the maximum matching size, the minimum vertex cover size, and the minimum dominating set size up to an additive epsilon*n in graphs with bounded treewidth. Finally, the oracle can be used to test in poly(1/epsilon) time whether the input bounded-treewidth graph is k-colorable or perfect. △ Less

Submitted 22 June, 2011; originally announced June 2011.

Comments: Full version of a paper to appear in RANDOM 2011

arXiv:0704.3287 [pdf, ps, other]

Sample size cognizant detection of signals in white noise

Authors: N. Raj Rao, Alan Edelman

Abstract: The detection and estimation of signals in noisy, limited data is a problem of interest to many scientific and engineering communities. We present a computationally simple, sample eigenvalue based procedure for estimating the number of high-dimensional signals in white noise when there are relatively few samples. We highlight a fundamental asymptotic limit of sample eigenvalue based detection of… ▽ More The detection and estimation of signals in noisy, limited data is a problem of interest to many scientific and engineering communities. We present a computationally simple, sample eigenvalue based procedure for estimating the number of high-dimensional signals in white noise when there are relatively few samples. We highlight a fundamental asymptotic limit of sample eigenvalue based detection of weak high-dimensional signals from a limited sample size and discuss its implication for the detection of two closely spaced signals. This motivates our heuristic definition of the 'effective number of identifiable signals.' Numerical simulations are used to demonstrate the consistency of the algorithm with respect to the effective number of signals and the superior performance of the algorithm with respect to Wax and Kailath's "asymptotically consistent" MDL based estimator. △ Less

Submitted 24 April, 2007; originally announced April 2007.

Comments: To appear in the Proceedings of the 8th IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Helsinki, Finland, June 17-20, 2007

Showing 1–28 of 28 results for author: Edelman, A