-
Physics-Constrained Flow Matching: Sampling Generative Models with Hard Constraints
Authors:
Utkarsh Utkarsh,
Pengfei Cai,
Alan Edelman,
Rafael Gomez-Bombarelli,
Christopher Vincent Rackauckas
Abstract:
Deep generative models have recently been applied to physical systems governed by partial differential equations (PDEs), offering scalable simulation and uncertainty-aware inference. However, enforcing physical constraints, such as conservation laws (linear and nonlinear) and physical consistencies, remains challenging. Existing methods often rely on soft penalties or architectural biases that fai…
▽ More
Deep generative models have recently been applied to physical systems governed by partial differential equations (PDEs), offering scalable simulation and uncertainty-aware inference. However, enforcing physical constraints, such as conservation laws (linear and nonlinear) and physical consistencies, remains challenging. Existing methods often rely on soft penalties or architectural biases that fail to guarantee hard constraints. In this work, we propose Physics-Constrained Flow Matching (PCFM), a zero-shot inference framework that enforces arbitrary nonlinear constraints in pretrained flow-based generative models. PCFM continuously guides the sampling process through physics-based corrections applied to intermediate solution states, while remaining aligned with the learned flow and satisfying physical constraints. Empirically, PCFM outperforms both unconstrained and constrained baselines on a range of PDEs, including those with shocks, discontinuities, and sharp features, while ensuring exact constraint satisfaction at the final solution. Our method provides a general framework for enforcing hard constraints in both scientific and general-purpose generative models, especially in applications where constraint satisfaction is essential.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Semi-Explicit Neural DAEs: Learning Long-Horizon Dynamical Systems with Algebraic Constraints
Authors:
Avik Pal,
Alan Edelman,
Christopher Rackauckas
Abstract:
Despite the promise of scientific machine learning (SciML) in combining data-driven techniques with mechanistic modeling, existing approaches for incorporating hard constraints in neural differential equations (NDEs) face significant limitations. Scalability issues and poor numerical properties prevent these neural models from being used for modeling physical systems with complicated conservation…
▽ More
Despite the promise of scientific machine learning (SciML) in combining data-driven techniques with mechanistic modeling, existing approaches for incorporating hard constraints in neural differential equations (NDEs) face significant limitations. Scalability issues and poor numerical properties prevent these neural models from being used for modeling physical systems with complicated conservation laws. We propose Manifold-Projected Neural ODEs (PNODEs), a method that explicitly enforces algebraic constraints by projecting each ODE step onto the constraint manifold. This framework arises naturally from semi-explicit differential-algebraic equations (DAEs), and includes both a robust iterative variant and a fast approximation requiring a single Jacobian factorization. We further demonstrate that prior works on relaxation methods are special cases of our approach. PNODEs consistently outperform baselines across six benchmark problems achieving a mean constraint violation error below $10^{-10}$. Additionally, PNODEs consistently achieve lower runtime compared to other methods for a given level of error tolerance. These results show that constraint projection offers a simple strategy for learning physically consistent long-horizon dynamics.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Toward Portable GPU Performance: Julia Recursive Implementation of TRMM and TRSM
Authors:
Vicki Carrica,
Maxwell Onyango,
Rabab Alomairy,
Evelyne Ringoot,
James Schloss,
Alan Edelman
Abstract:
This paper presents a performant and portable recursive implementation of triangular matrix-matrix multiplication (TRMM) and triangular solve (TRSM) in Julia for GPUs, two kernels that underlie many linear-algebra algorithms. We restructure TRMM and TRSM so that most work is executed as general matrix-matrix multiplication (GEMM), improving use of the GPU memory hierarchy and reducing latency. Exp…
▽ More
This paper presents a performant and portable recursive implementation of triangular matrix-matrix multiplication (TRMM) and triangular solve (TRSM) in Julia for GPUs, two kernels that underlie many linear-algebra algorithms. We restructure TRMM and TRSM so that most work is executed as general matrix-matrix multiplication (GEMM), improving use of the GPU memory hierarchy and reducing latency. Exploiting Julia's multiple dispatch and metaprogramming together with the GPUArrays and KernelAbstractions frameworks, we expose a single hardware-agnostic API that runs on NVIDIA, AMD, and Apple Silicon GPUs. For large matrices the recursive code reaches throughput comparable to vendor libraries such as cuBLAS and rocBLAS, while providing these routines on Apple Silicon for the first time. The entire implementation is only a few hundred lines of code, showing that unified Julia programs can deliver near-vendor performance across heterogeneous architectures.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
Matrix Calculus (for Machine Learning and Beyond)
Authors:
Paige Bright,
Alan Edelman,
Steven G. Johnson
Abstract:
This course, intended for undergraduates familiar with elementary calculus and linear algebra, introduces the extension of differential calculus to functions on more general vector spaces, such as functions that take as input a matrix and return a matrix inverse or factorization, derivatives of ODE solutions, and even stochastic derivatives of random functions. It emphasizes practical computatio…
▽ More
This course, intended for undergraduates familiar with elementary calculus and linear algebra, introduces the extension of differential calculus to functions on more general vector spaces, such as functions that take as input a matrix and return a matrix inverse or factorization, derivatives of ODE solutions, and even stochastic derivatives of random functions. It emphasizes practical computational applications, such as large-scale optimization and machine learning, where derivatives must be re-imagined in order to be propagated through complicated calculations. The class also discusses efficiency concerns leading to "adjoint" or "reverse-mode" differentiation (a.k.a. "backpropagation"), and gives a gentle introduction to modern automatic differentiation (AD) techniques.
△ Less
Submitted 7 January, 2025;
originally announced January 2025.
-
Oceananigans.jl: A Julia library that achieves breakthrough resolution, memory and energy efficiency in global ocean simulations
Authors:
Simone Silvestri,
Gregory L. Wagner,
Christopher Hill,
Matin Raayai Ardakani,
Johannes Blaschke,
Jean-Michel Campin,
Valentin Churavy,
Navid C. Constantinou,
Alan Edelman,
John Marshall,
Ali Ramadhan,
Andre Souza,
Raffaele Ferrari
Abstract:
Climate models must simulate hundreds of future scenarios for hundreds of years at coarse resolutions, and a handful of high-resolution decadal simulations to resolve localized extreme events. Using Oceananigans.jl, written from scratch in Julia, we report several achievements: First, a global ocean simulation with breakthrough horizontal resolution -- 488m -- reaching 15 simulated days per day (0…
▽ More
Climate models must simulate hundreds of future scenarios for hundreds of years at coarse resolutions, and a handful of high-resolution decadal simulations to resolve localized extreme events. Using Oceananigans.jl, written from scratch in Julia, we report several achievements: First, a global ocean simulation with breakthrough horizontal resolution -- 488m -- reaching 15 simulated days per day (0.04 simulated years per day; SYPD). Second, Oceananigans simulates the global ocean at 488m with breakthrough memory efficiency on just 768 Nvidia A100 GPUs, a fraction of the resources available on current and upcoming exascale supercomputers. Third, and arguably most significant for climate modeling, Oceananigans achieves breakthrough energy efficiency reaching 0.95 SYPD at 1.7 km on 576 A100s and 9.9 SYPD at 10 km on 68 A100s -- the latter representing the highest horizontal resolutions employed by current IPCC-class ocean models. Routine climate simulations with 10 km ocean components are within reach.
△ Less
Submitted 14 October, 2024; v1 submitted 12 September, 2023;
originally announced September 2023.
-
Automated Translation and Accelerated Solving of Differential Equations on Multiple GPU Platforms
Authors:
Utkarsh Utkarsh,
Valentin Churavy,
Yingbo Ma,
Tim Besard,
Prakitr Srisuma,
Tim Gymnich,
Adam R. Gerlach,
Alan Edelman,
George Barbastathis,
Richard D. Braatz,
Christopher Rackauckas
Abstract:
We demonstrate a high-performance vendor-agnostic method for massively parallel solving of ensembles of ordinary differential equations (ODEs) and stochastic differential equations (SDEs) on GPUs. The method is integrated with a widely used differential equation solver library in a high-level language (Julia's DifferentialEquations.jl) and enables GPU acceleration without requiring code changes by…
▽ More
We demonstrate a high-performance vendor-agnostic method for massively parallel solving of ensembles of ordinary differential equations (ODEs) and stochastic differential equations (SDEs) on GPUs. The method is integrated with a widely used differential equation solver library in a high-level language (Julia's DifferentialEquations.jl) and enables GPU acceleration without requiring code changes by the user. Our approach achieves state-of-the-art performance compared to hand-optimized CUDA-C++ kernels while performing 20--100$\times$ faster than the vectorizing map (vmap) approach implemented in JAX and PyTorch. Performance evaluation on NVIDIA, AMD, Intel, and Apple GPUs demonstrates performance portability and vendor-agnosticism. We show composability with MPI to enable distributed multi-GPU workflows. The implemented solvers are fully featured -- supporting event handling, automatic differentiation, and incorporation of datasets via the GPU's texture memory -- allowing scientists to take advantage of GPU acceleration on all major current architectures without changing their model code and without loss of performance. We distribute the software as an open-source library https://github.com/SciML/DiffEqGPU.jl
△ Less
Submitted 13 November, 2023; v1 submitted 13 April, 2023;
originally announced April 2023.
-
Backpropagation through Back Substitution with a Backslash
Authors:
Alan Edelman,
Ekin Akyurek,
Yuyang Wang
Abstract:
We present a linear algebra formulation of backpropagation which allows the calculation of gradients by using a generically written ``backslash'' or Gaussian elimination on triangular systems of equations. Generally, the matrix elements are operators. This paper has three contributions: (i) it is of intellectual value to replace traditional treatments of automatic differentiation with a (left acti…
▽ More
We present a linear algebra formulation of backpropagation which allows the calculation of gradients by using a generically written ``backslash'' or Gaussian elimination on triangular systems of equations. Generally, the matrix elements are operators. This paper has three contributions: (i) it is of intellectual value to replace traditional treatments of automatic differentiation with a (left acting) operator theoretic, graph-based approach; (ii) operators can be readily placed in matrices in software in programming languages such as Julia as an implementation option; (iii) we introduce a novel notation, ``transpose dot'' operator ``$\{\}^{T_\bullet}$'' that allows for the reversal of operators.
We further demonstrate the elegance of the operators approach in a suitable programming language consisting of generic linear algebra operators such as Julia \cite{bezanson2017julia}, and that it is possible to realize this abstraction in code. Our implementation shows how generic linear algebra can allow operators as elements of matrices. In contrast to ``operator overloading,'' where backslash would normally have to be rewritten to take advantage of operators, with ``generic programming'' there is no such need.
△ Less
Submitted 30 August, 2023; v1 submitted 23 February, 2023;
originally announced March 2023.
-
Locally Regularized Neural Differential Equations: Some Black Boxes Were Meant to Remain Closed!
Authors:
Avik Pal,
Alan Edelman,
Chris Rackauckas
Abstract:
Implicit layer deep learning techniques, like Neural Differential Equations, have become an important modeling framework due to their ability to adapt to new problems automatically. Training a neural differential equation is effectively a search over a space of plausible dynamical systems. However, controlling the computational cost for these models is difficult since it relies on the number of st…
▽ More
Implicit layer deep learning techniques, like Neural Differential Equations, have become an important modeling framework due to their ability to adapt to new problems automatically. Training a neural differential equation is effectively a search over a space of plausible dynamical systems. However, controlling the computational cost for these models is difficult since it relies on the number of steps the adaptive solver takes. Most prior works have used higher-order methods to reduce prediction timings while greatly increasing training time or reducing both training and prediction timings by relying on specific training algorithms, which are harder to use as a drop-in replacement due to strict requirements on automatic differentiation. In this manuscript, we use internal cost heuristics of adaptive differential equation solvers at stochastic time points to guide the training toward learning a dynamical system that is easier to integrate. We "close the black-box" and allow the use of our method with any adjoint technique for gradient calculations of the differential equation solution. We perform experimental studies to compare our method to global regularization to show that we attain similar performance numbers without compromising the flexibility of implementation on ordinary differential equations (ODEs) and stochastic differential equations (SDEs). We develop two sampling strategies to trade off between performance and training time. Our method reduces the number of function evaluations to 0.556-0.733x and accelerates predictions by 1.3-2x.
△ Less
Submitted 2 June, 2023; v1 submitted 3 March, 2023;
originally announced March 2023.
-
Bridging HPC Communities through the Julia Programming Language
Authors:
Valentin Churavy,
William F Godoy,
Carsten Bauer,
Hendrik Ranocha,
Michael Schlottke-Lakemper,
Ludovic Räss,
Johannes Blaschke,
Mosè Giordano,
Erik Schnetter,
Samuel Omlin,
Jeffrey S. Vetter,
Alan Edelman
Abstract:
The Julia programming language has evolved into a modern alternative to fill existing gaps in scientific computing and data science applications. Julia leverages a unified and coordinated single-language and ecosystem paradigm and has a proven track record of achieving high performance without sacrificing user productivity. These aspects make Julia a viable alternative to high-performance computin…
▽ More
The Julia programming language has evolved into a modern alternative to fill existing gaps in scientific computing and data science applications. Julia leverages a unified and coordinated single-language and ecosystem paradigm and has a proven track record of achieving high performance without sacrificing user productivity. These aspects make Julia a viable alternative to high-performance computing's (HPC's) existing and increasingly costly many-body workflow composition strategy in which traditional HPC languages (e.g., Fortran, C, C++) are used for simulations, and higher-level languages (e.g., Python, R, MATLAB) are used for data analysis and interactive computing. Julia's rapid growth in language capabilities, package ecosystem, and community make it a promising universal language for HPC. This paper presents the views of a multidisciplinary group of researchers from academia, government, and industry that advocate for an HPC software development paradigm that emphasizes developer productivity, workflow portability, and low barriers for entry. We believe that the Julia programming language, its ecosystem, and its community provide modern and powerful capabilities that enable this group's objectives. Crucially, we believe that Julia can provide a feasible and less costly approach to programming scientific applications and workflows that target HPC facilities. In this work, we examine the current practice and role of Julia as a common, end-to-end programming model to address major challenges in scientific reproducibility, data-driven AI/machine learning, co-design and workflows, scalability and performance portability in heterogeneous computing, network communication, data management, and community education. As a result, the diversification of current investments to fulfill the needs of the upcoming decade is crucial as more supercomputing centers prepare for the exascale era.
△ Less
Submitted 10 November, 2022; v1 submitted 4 November, 2022;
originally announced November 2022.
-
Continuous Deep Equilibrium Models: Training Neural ODEs faster by integrating them to Infinity
Authors:
Avik Pal,
Alan Edelman,
Christopher Rackauckas
Abstract:
Implicit models separate the definition of a layer from the description of its solution process. While implicit layers allow features such as depth to adapt to new scenarios and inputs automatically, this adaptivity makes its computational expense challenging to predict. In this manuscript, we increase the "implicitness" of the DEQ by redefining the method in terms of an infinite time neural ODE,…
▽ More
Implicit models separate the definition of a layer from the description of its solution process. While implicit layers allow features such as depth to adapt to new scenarios and inputs automatically, this adaptivity makes its computational expense challenging to predict. In this manuscript, we increase the "implicitness" of the DEQ by redefining the method in terms of an infinite time neural ODE, which paradoxically decreases the training cost over a standard neural ODE by 2-4x. Additionally, we address the question: is there a way to simultaneously achieve the robustness of implicit layers while allowing the reduced computational expense of an explicit layer? To solve this, we develop Skip and Skip Reg. DEQ, an implicit-explicit (IMEX) layer that simultaneously trains an explicit prediction followed by an implicit correction. We show that training this explicit predictor is free and even decreases the training time by 1.11-3.19x. Together, this manuscript shows how bridging the dichotomy of implicit and explicit deep learning can combine the advantages of both techniques.
△ Less
Submitted 3 March, 2023; v1 submitted 28 January, 2022;
originally announced January 2022.
-
Composing Modeling and Simulation with Machine Learning in Julia
Authors:
Chris Rackauckas,
Ranjan Anantharaman,
Alan Edelman,
Shashi Gowda,
Maja Gwozdz,
Anand Jain,
Chris Laughman,
Yingbo Ma,
Francesco Martinuzzi,
Avik Pal,
Utkarsh Rajput,
Elliot Saba,
Viral B. Shah
Abstract:
In this paper we introduce JuliaSim, a high-performance programming environment designed to blend traditional modeling and simulation with machine learning. JuliaSim can build accelerated surrogates from component-based models, such as those conforming to the FMI standard, using continuous-time echo state networks (CTESN). The foundation of this environment, ModelingToolkit.jl, is an acausal model…
▽ More
In this paper we introduce JuliaSim, a high-performance programming environment designed to blend traditional modeling and simulation with machine learning. JuliaSim can build accelerated surrogates from component-based models, such as those conforming to the FMI standard, using continuous-time echo state networks (CTESN). The foundation of this environment, ModelingToolkit.jl, is an acausal modeling language which can compose the trained surrogates as components within its staged compilation process. As a complementary factor we present the JuliaSim model library, a standard library with differential-algebraic equations and pre-trained surrogates, which can be composed using the modeling system for design, optimization, and control. We demonstrate the effectiveness of the surrogate-accelerated modeling and simulation approach on HVAC dynamics by showing that the CTESN surrogates accurately capture the dynamics of a HVAC cycle at less than 4\% error while accelerating its simulation by 340x. We illustrate the use of surrogate acceleration in the design process via global optimization of simulation parameters using the embedded surrogate, yielding a speedup of two orders of magnitude to find the optimum. We showcase the surrogate deployed in a co-simulation loop, as a drop-in replacement for one of the coupled FMUs, allowing engineers to effectively explore the design space of a coupled system. Together this demonstrates a workflow for automating the integration of machine learning techniques into traditional modeling and simulation processes.
△ Less
Submitted 12 May, 2021;
originally announced May 2021.
-
High-performance symbolic-numerics via multiple dispatch
Authors:
Shashi Gowda,
Yingbo Ma,
Alessandro Cheli,
Maja Gwozdz,
Viral B. Shah,
Alan Edelman,
Christopher Rackauckas
Abstract:
As mathematical computing becomes more democratized in high-level languages, high-performance symbolic-numeric systems are necessary for domain scientists and engineers to get the best performance out of their machine without deep knowledge of code optimization. Naturally, users need different term types either to have different algebraic properties for them, or to use efficient data structures. T…
▽ More
As mathematical computing becomes more democratized in high-level languages, high-performance symbolic-numeric systems are necessary for domain scientists and engineers to get the best performance out of their machine without deep knowledge of code optimization. Naturally, users need different term types either to have different algebraic properties for them, or to use efficient data structures. To this end, we developed Symbolics.jl, an extendable symbolic system which uses dynamic multiple dispatch to change behavior depending on the domain needs. In this work we detail an underlying abstract term interface which allows for speed without sacrificing generality. We show that by formalizing a generic API on actions independent of implementation, we can retroactively add optimized data structures to our system without changing the pre-existing term rewriters. We showcase how this can be used to optimize term construction and give a 113x acceleration on general symbolic transformations. Further, we show that such a generic API allows for complementary term-rewriting implementations. We demonstrate the ability to swap between classical term-rewriting simplifiers and e-graph-based term-rewriting simplifiers. We showcase an e-graph ruleset which minimizes the number of CPU cycles during expression evaluation, and demonstrate how it simplifies a real-world reaction-network simulation to halve the runtime. Additionally, we show a reaction-diffusion partial differential equation solver which is able to be automatically converted into symbolic expressions via multiple dispatch tracing, which is subsequently accelerated and parallelized to give a 157x simulation speedup. Together, this presents Symbolics.jl as a next-generation symbolic-numeric computing environment geared towards modeling and simulation.
△ Less
Submitted 5 February, 2022; v1 submitted 9 May, 2021;
originally announced May 2021.
-
StudyU: a platform for designing and conducting innovative digital N-of-1 trials
Authors:
Stefan Konigorski,
Sarah Wernicke,
Tamara Slosarek,
Alexander M. Zenner,
Nils Strelow,
Ferenc D. Ruether,
Florian Henschel,
Manisha Manaswini,
Fabian Pottbäcker,
Jonathan A. Edelman,
Babajide Owoyele,
Matteo Danieletto,
Eddye Golden,
Micol Zweig,
Girish Nadkarni,
Erwin Böttinger
Abstract:
N-of-1 trials are the gold standard study design to evaluate individual treatment effects and derive personalized treatment strategies. Digital tools have the potential to initiate a new era of N-of-1 trials in terms of scale and scope, but fully-functional platforms are not yet available. Here, we present the open source StudyU platform which includes the StudyU designer and StudyU app. With the…
▽ More
N-of-1 trials are the gold standard study design to evaluate individual treatment effects and derive personalized treatment strategies. Digital tools have the potential to initiate a new era of N-of-1 trials in terms of scale and scope, but fully-functional platforms are not yet available. Here, we present the open source StudyU platform which includes the StudyU designer and StudyU app. With the StudyU designer, scientists are given a collaborative web application to digitally specify, publish, and conduct N-of-1 trials. The StudyU app is a smartphone application with innovative user-centric elements for participants to partake in the published trials and assess the effects of different interventions on their health. Thereby, the StudyU platform allows clinicians and researchers worldwide to easily design and conduct digital N-of-1 trials in a safe manner. We envision that StudyU can change the landscape of personalized treatments both for patients and healthy individuals, democratize and personalize evidence generation for self-optimization and medicine, and can be integrated in clinical practice.
△ Less
Submitted 12 July, 2021; v1 submitted 28 December, 2020;
originally announced December 2020.
-
AutoMat: Accelerated Computational Electrochemical systems Discovery
Authors:
Emil Annevelink,
Rachel Kurchin,
Eric Muckley,
Lance Kavalsky,
Vinay I. Hegde,
Valentin Sulzer,
Shang Zhu,
Jiankun Pu,
David Farina,
Matthew Johnson,
Dhairya Gandhi,
Adarsh Dave,
Hongyi Lin,
Alan Edelman,
Bharath Ramsundar,
James Saal,
Christopher Rackauckas,
Viral Shah,
Bryce Meredig,
Venkatasubramanian Viswanathan
Abstract:
Large-scale electrification is vital to addressing the climate crisis, but several scientific and technological challenges remain to fully electrify both the chemical industry and transportation. In both of these areas, new electrochemical materials will be critical, but their development currently relies heavily on human-time-intensive experimental trial and error and computationally expensive fi…
▽ More
Large-scale electrification is vital to addressing the climate crisis, but several scientific and technological challenges remain to fully electrify both the chemical industry and transportation. In both of these areas, new electrochemical materials will be critical, but their development currently relies heavily on human-time-intensive experimental trial and error and computationally expensive first-principles, meso-scale and continuum simulations. We present an automated workflow, AutoMat, that accelerates these computational steps by introducing both automated input generation and management of simulations across scales from first principles to continuum device modeling. Furthermore, we show how to seamlessly integrate multi-fidelity predictions such as machine learning surrogates or automated robotic experiments "in-the-loop". The automated framework is implemented with design space search techniques to dramatically accelerate the overall materials discovery pipeline by implicitly learning design features that optimize device performance across several metrics. We discuss the benefits of AutoMat using examples in electrocatalysis and energy storage and highlight lessons learned.
△ Less
Submitted 13 May, 2022; v1 submitted 3 November, 2020;
originally announced November 2020.
-
Accelerating Simulation of Stiff Nonlinear Systems using Continuous-Time Echo State Networks
Authors:
Ranjan Anantharaman,
Yingbo Ma,
Shashi Gowda,
Chris Laughman,
Viral Shah,
Alan Edelman,
Chris Rackauckas
Abstract:
Modern design, control, and optimization often requires simulation of highly nonlinear models, leading to prohibitive computational costs. These costs can be amortized by evaluating a cheap surrogate of the full model. Here we present a general data-driven method, the continuous-time echo state network (CTESN), for generating surrogates of nonlinear ordinary differential equations with dynamics at…
▽ More
Modern design, control, and optimization often requires simulation of highly nonlinear models, leading to prohibitive computational costs. These costs can be amortized by evaluating a cheap surrogate of the full model. Here we present a general data-driven method, the continuous-time echo state network (CTESN), for generating surrogates of nonlinear ordinary differential equations with dynamics at widely separated timescales. We empirically demonstrate near-constant time performance using our CTESNs on a physically motivated scalable model of a heating system whose full execution time increases exponentially, while maintaining relative error of within 0.2 %. We also show that our model captures fast transients as well as slow dynamics effectively, while other techniques such as physics informed neural networks have difficulties trying to train and predict the highly nonlinear behavior of these models.
△ Less
Submitted 24 March, 2021; v1 submitted 7 October, 2020;
originally announced October 2020.
-
Signal Enhancement for Magnetic Navigation Challenge Problem
Authors:
Albert R. Gnadt,
Joseph Belarge,
Aaron Canciani,
Glenn Carl,
Lauren Conger,
Joseph Curro,
Alan Edelman,
Peter Morales,
Aaron P. Nielsen,
Michael F. O'Keeffe,
Christopher V. Rackauckas,
Jonathan Taylor,
Allan B. Wollaber
Abstract:
Harnessing the magnetic field of the Earth for navigation has shown promise as a viable alternative to other navigation systems. A magnetic navigation system collects its own magnetic field data using a magnetometer and uses magnetic anomaly maps to determine the current location. The greatest challenge with magnetic navigation arises when the magnetic field measurements from the magnetometer enco…
▽ More
Harnessing the magnetic field of the Earth for navigation has shown promise as a viable alternative to other navigation systems. A magnetic navigation system collects its own magnetic field data using a magnetometer and uses magnetic anomaly maps to determine the current location. The greatest challenge with magnetic navigation arises when the magnetic field measurements from the magnetometer encompass the magnetic field from not just the Earth, but also from the vehicle on which it is mounted. It is difficult to separate the Earth magnetic anomaly field, which is crucial for navigation, from the total magnetic field reading from the sensor. The purpose of this challenge problem is to decouple the Earth and aircraft magnetic signals in order to derive a clean signal from which to perform magnetic navigation. Baseline testing on the dataset has shown that the Earth magnetic field can be extracted from the total magnetic field using machine learning (ML). The challenge is to remove the aircraft magnetic field from the total magnetic field using a trained model. This challenge offers an opportunity to construct an effective model for removing the aircraft magnetic field from the dataset by using a scientific machine learning (SciML) approach comprised of an ML algorithm integrated with the physics of magnetic navigation.
△ Less
Submitted 6 January, 2023; v1 submitted 23 July, 2020;
originally announced July 2020.
-
Universal Differential Equations for Scientific Machine Learning
Authors:
Christopher Rackauckas,
Yingbo Ma,
Julius Martensen,
Collin Warner,
Kirill Zubov,
Rohit Supekar,
Dominic Skinner,
Ali Ramadhan,
Alan Edelman
Abstract:
In the context of science, the well-known adage "a picture is worth a thousand words" might well be "a model is worth a thousand datasets." In this manuscript we introduce the SciML software ecosystem as a tool for mixing the information of physical laws and scientific models with data-driven machine learning approaches. We describe a mathematical object, which we denote universal differential equ…
▽ More
In the context of science, the well-known adage "a picture is worth a thousand words" might well be "a model is worth a thousand datasets." In this manuscript we introduce the SciML software ecosystem as a tool for mixing the information of physical laws and scientific models with data-driven machine learning approaches. We describe a mathematical object, which we denote universal differential equations (UDEs), as the unifying framework connecting the ecosystem. We show how a wide variety of applications, from automatically discovering biological mechanisms to solving high-dimensional Hamilton-Jacobi-Bellman equations, can be phrased and efficiently handled through the UDE formalism and its tooling. We demonstrate the generality of the software tooling to handle stochasticity, delays, and implicit constraints. This funnels the wide variety of SciML applications into a core set of training mechanisms which are highly optimized, stabilized for stiff equations, and compatible with distributed parallelism and GPU accelerators.
△ Less
Submitted 2 November, 2021; v1 submitted 13 January, 2020;
originally announced January 2020.
-
A Differentiable Programming System to Bridge Machine Learning and Scientific Computing
Authors:
Mike Innes,
Alan Edelman,
Keno Fischer,
Chris Rackauckas,
Elliot Saba,
Viral B Shah,
Will Tebbutt
Abstract:
Scientific computing is increasingly incorporating the advancements in machine learning and the ability to work with large amounts of data. At the same time, machine learning models are becoming increasingly sophisticated and exhibit many features often seen in scientific computing, stressing the capabilities of machine learning frameworks. Just as the disciplines of scientific computing and machi…
▽ More
Scientific computing is increasingly incorporating the advancements in machine learning and the ability to work with large amounts of data. At the same time, machine learning models are becoming increasingly sophisticated and exhibit many features often seen in scientific computing, stressing the capabilities of machine learning frameworks. Just as the disciplines of scientific computing and machine learning have shared common underlying infrastructure in the form of numerical linear algebra, we now have the opportunity to further share new computational infrastructure, and thus ideas, in the form of Differentiable Programming. We describe Zygote, a Differentiable Programming system that is able to take gradients of general program structures. We implement this system in the Julia programming language. Our system supports almost all language constructs (control flow, recursion, mutation, etc.) and compiles high-performance code without requiring any user intervention or refactoring to stage computations. This enables an expressive programming model for deep learning, but more importantly, it enables us to incorporate a large ecosystem of libraries in our models in a straightforward way. We discuss our approach to automatic differentiation, including its support for advanced techniques such as mixed-mode, complex and checkpointed differentiation, and present several examples of differentiating programs.
△ Less
Submitted 18 July, 2019; v1 submitted 17 July, 2019;
originally announced July 2019.
-
Fast computation of the principal components of genotype matrices in Julia
Authors:
Jiahao Chen,
Andreas Noack,
Alan Edelman
Abstract:
Finding the largest few principal components of a matrix of genetic data is a common task in genome-wide association studies (GWASs), both for dimensionality reduction and for identifying unwanted factors of variation. We describe a simple random matrix model for matrices that arise in GWASs, showing that the singular values have a bulk behavior that obeys a Marchenko-Pastur distributed with a han…
▽ More
Finding the largest few principal components of a matrix of genetic data is a common task in genome-wide association studies (GWASs), both for dimensionality reduction and for identifying unwanted factors of variation. We describe a simple random matrix model for matrices that arise in GWASs, showing that the singular values have a bulk behavior that obeys a Marchenko-Pastur distributed with a handful of large outliers. We also implement Golub-Kahan-Lanczos (GKL) bidiagonalization in the Julia programming language, providing thick restarting and a choice between full and partial reorthogonalization strategies to control numerical roundoff. Our implementation of GKL bidiagonalization is up to 36 times faster than software tools used commonly in genomics data analysis for computing principal components, such as EIGENSOFT and FlashPCA, which use dense LAPACK routines and randomized subspace iteration respectively.
△ Less
Submitted 9 August, 2018;
originally announced August 2018.
-
TabulaROSA: Tabular Operating System Architecture for Massively Parallel Heterogeneous Compute Engines
Authors:
Jeremy Kepner,
Ron Brightwell,
Alan Edelman,
Vijay Gadepally,
Hayden Jananthan,
Michael Jones,
Sam Madden,
Peter Michaleas,
Hamed Okhravi,
Kevin Pedretti,
Albert Reuther,
Thomas Sterling,
Mike Stonebraker
Abstract:
The rise in computing hardware choices is driving a reevaluation of operating systems. The traditional role of an operating system controlling the execution of its own hardware is evolving toward a model whereby the controlling processor is distinct from the compute engines that are performing most of the computations. In this context, an operating system can be viewed as software that brokers and…
▽ More
The rise in computing hardware choices is driving a reevaluation of operating systems. The traditional role of an operating system controlling the execution of its own hardware is evolving toward a model whereby the controlling processor is distinct from the compute engines that are performing most of the computations. In this context, an operating system can be viewed as software that brokers and tracks the resources of the compute engines and is akin to a database management system. To explore the idea of using a database in an operating system role, this work defines key operating system functions in terms of rigorous mathematical semantics (associative array algebra) that are directly translatable into database operations. These operations possess a number of mathematical properties that are ideal for parallel operating systems by guaranteeing correctness over a wide range of parallel operations. The resulting operating system equations provide a mathematical specification for a Tabular Operating System Architecture (TabulaROSA) that can be implemented on any platform. Simulations of forking in TabularROSA are performed using an associative array implementation and compared to Linux on a 32,000+ core supercomputer. Using over 262,000 forkers managing over 68,000,000,000 processes, the simulations show that TabulaROSA has the potential to perform operating system functions on a massively parallel scale. The TabulaROSA simulations show 20x higher performance as compared to Linux while managing 2000x more processes in fully searchable tables.
△ Less
Submitted 13 July, 2018;
originally announced July 2018.
-
Accelerated Convolutions for Efficient Multi-Scale Time to Contact Computation in Julia
Authors:
Alexander Amini,
Berthold Horn,
Alan Edelman
Abstract:
Convolutions have long been regarded as fundamental to applied mathematics, physics and engineering. Their mathematical elegance allows for common tasks such as numerical differentiation to be computed efficiently on large data sets. Efficient computation of convolutions is critical to artificial intelligence in real-time applications, like machine vision, where convolutions must be continuously a…
▽ More
Convolutions have long been regarded as fundamental to applied mathematics, physics and engineering. Their mathematical elegance allows for common tasks such as numerical differentiation to be computed efficiently on large data sets. Efficient computation of convolutions is critical to artificial intelligence in real-time applications, like machine vision, where convolutions must be continuously and efficiently computed on tens to hundreds of kilobytes per second. In this paper, we explore how convolutions are used in fundamental machine vision applications. We present an accelerated n-dimensional convolution package in the high performance computing language, Julia, and demonstrate its efficacy in solving the time to contact problem for machine vision. Results are measured against synthetically generated videos and quantitatively assessed according to their mean squared error from the ground truth. We achieve over an order of magnitude decrease in compute time and allocated memory for comparable machine vision applications. All code is packaged and integrated into the official Julia Package Manager to be used in various other scenarios.
△ Less
Submitted 28 December, 2016;
originally announced December 2016.
-
Julia Implementation of the Dynamic Distributed Dimensional Data Model
Authors:
Alexander Chen,
Alan Edelman,
Jeremy Kepner,
Vijay Gadepally,
Dylan Hutchison
Abstract:
Julia is a new language for writing data analysis programs that are easy to implement and run at high performance. Similarly, the Dynamic Distributed Dimensional Data Model (D4M) aims to clarify data analysis operations while retaining strong performance. D4M accomplishes these goals through a composable, unified data model on associative arrays. In this work, we present an implementation of D4M i…
▽ More
Julia is a new language for writing data analysis programs that are easy to implement and run at high performance. Similarly, the Dynamic Distributed Dimensional Data Model (D4M) aims to clarify data analysis operations while retaining strong performance. D4M accomplishes these goals through a composable, unified data model on associative arrays. In this work, we present an implementation of D4M in Julia and describe how it enables and facilitates data analysis. Several experiments showcase scalable performance in our new Julia version as compared to the original Matlab implementation.
△ Less
Submitted 13 August, 2016;
originally announced August 2016.
-
Julia: A Fresh Approach to Numerical Computing
Authors:
Jeff Bezanson,
Alan Edelman,
Stefan Karpinski,
Viral B. Shah
Abstract:
Bridging cultures that have often been distant, Julia combines expertise from the diverse fields of computer science and computational science to create a new approach to numerical computing. Julia is designed to be easy and fast. Julia questions notions generally held as "laws of nature" by practitioners of numerical computing:
1. High-level dynamic programs have to be slow.
2. One must proto…
▽ More
Bridging cultures that have often been distant, Julia combines expertise from the diverse fields of computer science and computational science to create a new approach to numerical computing. Julia is designed to be easy and fast. Julia questions notions generally held as "laws of nature" by practitioners of numerical computing:
1. High-level dynamic programs have to be slow.
2. One must prototype in one language and then rewrite in another language for speed or deployment, and
3. There are parts of a system for the programmer, and other parts best left untouched as they are built by the experts.
We introduce the Julia programming language and its design --- a dance between specialization and abstraction. Specialization allows for custom treatment. Multiple dispatch, a technique from computer science, picks the right algorithm for the right circumstance. Abstraction, what good computation is really about, recognizes what remains the same after differences are stripped away. Abstractions in mathematics are captured as code through another technique from computer science, generic programming.
Julia shows that one can have machine performance without sacrificing human convenience.
△ Less
Submitted 19 July, 2015; v1 submitted 6 November, 2014;
originally announced November 2014.
-
Parallel Prefix Polymorphism Permits Parallelization, Presentation & Proof
Authors:
Jiahao Chen,
Alan Edelman
Abstract:
Polymorphism in programming languages enables code reuse. Here, we show that polymorphism has broad applicability far beyond computations for technical computing: parallelism in distributed computing, presentation of visualizations of runtime data flow, and proofs for formal verification of correctness. The ability to reuse a single codebase for all these purposes provides new ways to understand a…
▽ More
Polymorphism in programming languages enables code reuse. Here, we show that polymorphism has broad applicability far beyond computations for technical computing: parallelism in distributed computing, presentation of visualizations of runtime data flow, and proofs for formal verification of correctness. The ability to reuse a single codebase for all these purposes provides new ways to understand and verify parallel programs.
△ Less
Submitted 6 November, 2014; v1 submitted 23 October, 2014;
originally announced October 2014.
-
Array operators using multiple dispatch: a design methodology for array implementations in dynamic languages
Authors:
Jeff Bezanson,
Jiahao Chen,
Stefan Karpinski,
Viral Shah,
Alan Edelman
Abstract:
Arrays are such a rich and fundamental data type that they tend to be built into a language, either in the compiler or in a large low-level library. Defining this functionality at the user level instead provides greater flexibility for application domains not envisioned by the language designer. Only a few languages, such as C++ and Haskell, provide the necessary power to define $n$-dimensional ar…
▽ More
Arrays are such a rich and fundamental data type that they tend to be built into a language, either in the compiler or in a large low-level library. Defining this functionality at the user level instead provides greater flexibility for application domains not envisioned by the language designer. Only a few languages, such as C++ and Haskell, provide the necessary power to define $n$-dimensional arrays, but these systems rely on compile-time abstraction, sacrificing some flexibility. In contrast, dynamic languages make it straightforward for the user to define any behavior they might want, but at the possible expense of performance.
As part of the Julia language project, we have developed an approach that yields a novel trade-off between flexibility and compile-time analysis. The core abstraction we use is multiple dispatch. We have come to believe that while multiple dispatch has not been especially popular in most kinds of programming, technical computing is its killer application. By expressing key functions such as array indexing using multi-method signatures, a surprising range of behaviors can be obtained, in a way that is both relatively easy to write and amenable to compiler analysis. The compact factoring of concerns provided by these methods makes it easier for user-defined types to behave consistently with types in the standard library.
△ Less
Submitted 14 July, 2014;
originally announced July 2014.
-
Julia: A Fast Dynamic Language for Technical Computing
Authors:
Jeff Bezanson,
Stefan Karpinski,
Viral B. Shah,
Alan Edelman
Abstract:
Dynamic languages have become popular for scientific computing. They are generally considered highly productive, but lacking in performance. This paper presents Julia, a new dynamic language for technical computing, designed for performance from the beginning by adapting and extending modern programming language techniques. A design based on generic functions and a rich type system simultaneously…
▽ More
Dynamic languages have become popular for scientific computing. They are generally considered highly productive, but lacking in performance. This paper presents Julia, a new dynamic language for technical computing, designed for performance from the beginning by adapting and extending modern programming language techniques. A design based on generic functions and a rich type system simultaneously enables an expressive programming model and successful type inference, leading to good performance for a wide range of programs. This makes it possible for much of the Julia library to be written in Julia itself, while also incorporating best-of-breed C and Fortran libraries.
△ Less
Submitted 23 September, 2012;
originally announced September 2012.
-
An Efficient Partitioning Oracle for Bounded-Treewidth Graphs
Authors:
Alan Edelman,
Avinatan Hassidim,
Huy N. Nguyen,
Krzysztof Onak
Abstract:
Partitioning oracles were introduced by Hassidim et al. (FOCS 2009) as a generic tool for constant-time algorithms. For any epsilon > 0, a partitioning oracle provides query access to a fixed partition of the input bounded-degree minor-free graph, in which every component has size poly(1/epsilon), and the number of edges removed is at most epsilon*n, where n is the number of vertices in the graph.…
▽ More
Partitioning oracles were introduced by Hassidim et al. (FOCS 2009) as a generic tool for constant-time algorithms. For any epsilon > 0, a partitioning oracle provides query access to a fixed partition of the input bounded-degree minor-free graph, in which every component has size poly(1/epsilon), and the number of edges removed is at most epsilon*n, where n is the number of vertices in the graph.
However, the oracle of Hassidimet al. makes an exponential number of queries to the input graph to answer every query about the partition. In this paper, we construct an efficient partitioning oracle for graphs with constant treewidth. The oracle makes only O(poly(1/epsilon)) queries to the input graph to answer each query about the partition.
Examples of bounded-treewidth graph classes include k-outerplanar graphs for fixed k, series-parallel graphs, cactus graphs, and pseudoforests. Our oracle yields poly(1/epsilon)-time property testing algorithms for membership in these classes of graphs. Another application of the oracle is a poly(1/epsilon)-time algorithm that approximates the maximum matching size, the minimum vertex cover size, and the minimum dominating set size up to an additive epsilon*n in graphs with bounded treewidth. Finally, the oracle can be used to test in poly(1/epsilon) time whether the input bounded-treewidth graph is k-colorable or perfect.
△ Less
Submitted 22 June, 2011;
originally announced June 2011.
-
Sample size cognizant detection of signals in white noise
Authors:
N. Raj Rao,
Alan Edelman
Abstract:
The detection and estimation of signals in noisy, limited data is a problem of interest to many scientific and engineering communities. We present a computationally simple, sample eigenvalue based procedure for estimating the number of high-dimensional signals in white noise when there are relatively few samples. We highlight a fundamental asymptotic limit of sample eigenvalue based detection of…
▽ More
The detection and estimation of signals in noisy, limited data is a problem of interest to many scientific and engineering communities. We present a computationally simple, sample eigenvalue based procedure for estimating the number of high-dimensional signals in white noise when there are relatively few samples. We highlight a fundamental asymptotic limit of sample eigenvalue based detection of weak high-dimensional signals from a limited sample size and discuss its implication for the detection of two closely spaced signals.
This motivates our heuristic definition of the 'effective number of identifiable signals.' Numerical simulations are used to demonstrate the consistency of the algorithm with respect to the effective number of signals and the superior performance of the algorithm with respect to Wax and Kailath's "asymptotically consistent" MDL based estimator.
△ Less
Submitted 24 April, 2007;
originally announced April 2007.