-
An Approximation Theory Perspective on Machine Learning
Authors:
Hrushikesh N. Mhaskar,
Efstratios Tsoukanis,
Ameya D. Jagtap
Abstract:
A central problem in machine learning is often formulated as follows: Given a dataset $\{(x_j, y_j)\}_{j=1}^M$, which is a sample drawn from an unknown probability distribution, the goal is to construct a functional model $f$ such that $f(x) \approx y$ for any $(x, y)$ drawn from the same distribution. Neural networks and kernel-based methods are commonly employed for this task due to their capaci…
▽ More
A central problem in machine learning is often formulated as follows: Given a dataset $\{(x_j, y_j)\}_{j=1}^M$, which is a sample drawn from an unknown probability distribution, the goal is to construct a functional model $f$ such that $f(x) \approx y$ for any $(x, y)$ drawn from the same distribution. Neural networks and kernel-based methods are commonly employed for this task due to their capacity for fast and parallel computation. The approximation capabilities, or expressive power, of these methods have been extensively studied over the past 35 years. In this paper, we will present examples of key ideas in this area found in the literature. We will discuss emerging trends in machine learning including the role of shallow/deep networks, approximation on manifolds, physics-informed neural surrogates, neural operators, and transformer architectures. Despite function approximation being a fundamental problem in machine learning, approximation theory does not play a central role in the theoretical foundations of the field. One unfortunate consequence of this disconnect is that it is often unclear how well trained models will generalize to unseen or unlabeled data. In this review, we examine some of the shortcomings of the current machine learning framework and explore the reasons for the gap between approximation theory and machine learning practice. We will then introduce our novel research to achieve function approximation on unknown manifolds without the need to learn specific manifold features, such as the eigen-decomposition of the Laplace-Beltrami operator or atlas construction. In many machine learning problems, particularly classification tasks, the labels $y_j$ are drawn from a finite set of values.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
Anant-Net: Breaking the Curse of Dimensionality with Scalable and Interpretable Neural Surrogate for High-Dimensional PDEs
Authors:
Sidharth S. Menon,
Ameya D. Jagtap
Abstract:
High-dimensional partial differential equations (PDEs) arise in diverse scientific and engineering applications but remain computationally intractable due to the curse of dimensionality. Traditional numerical methods struggle with the exponential growth in computational complexity, particularly on hypercubic domains, where the number of required collocation points increases rapidly with dimensiona…
▽ More
High-dimensional partial differential equations (PDEs) arise in diverse scientific and engineering applications but remain computationally intractable due to the curse of dimensionality. Traditional numerical methods struggle with the exponential growth in computational complexity, particularly on hypercubic domains, where the number of required collocation points increases rapidly with dimensionality. Here, we introduce Anant-Net, an efficient neural surrogate that overcomes this challenge, enabling the solution of PDEs in high dimensions. Unlike hyperspheres, where the internal volume diminishes as dimensionality increases, hypercubes retain or expand their volume (for unit or larger length), making high-dimensional computations significantly more demanding. Anant-Net efficiently incorporates high-dimensional boundary conditions and minimizes the PDE residual at high-dimensional collocation points. To enhance interpretability, we integrate Kolmogorov-Arnold networks into the Anant-Net architecture. We benchmark Anant-Net's performance on several linear and nonlinear high-dimensional equations, including the Poisson, Sine-Gordon, and Allen-Cahn equations, demonstrating high accuracy and robustness across randomly sampled test points from high-dimensional space. Importantly, Anant-Net achieves these results with remarkable efficiency, solving 300-dimensional problems on a single GPU within a few hours. We also compare Anant-Net's results for accuracy and runtime with other state-of-the-art methods. Our findings establish Anant-Net as an accurate, interpretable, and scalable framework for efficiently solving high-dimensional PDEs.
△ Less
Submitted 7 May, 2025; v1 submitted 6 May, 2025;
originally announced May 2025.
-
Systematic Literature Review on Vehicular Collaborative Perception -- A Computer Vision Perspective
Authors:
Lei Wan,
Jianxin Zhao,
Andreas Wiedholz,
Manuel Bied,
Mateus Martinez de Lucena,
Abhishek Dinkar Jagtap,
Andreas Festag,
Antônio Augusto Fröhlich,
Hannan Ejaz Keen,
Alexey Vinel
Abstract:
The effectiveness of autonomous vehicles relies on reliable perception capabilities. Despite significant advancements in artificial intelligence and sensor fusion technologies, current single-vehicle perception systems continue to encounter limitations, notably visual occlusions and limited long-range detection capabilities. Collaborative Perception (CP), enabled by Vehicle-to-Vehicle (V2V) and Ve…
▽ More
The effectiveness of autonomous vehicles relies on reliable perception capabilities. Despite significant advancements in artificial intelligence and sensor fusion technologies, current single-vehicle perception systems continue to encounter limitations, notably visual occlusions and limited long-range detection capabilities. Collaborative Perception (CP), enabled by Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) communication, has emerged as a promising solution to mitigate these issues and enhance the reliability of autonomous systems. Beyond advancements in communication, the computer vision community is increasingly focusing on improving vehicular perception through collaborative approaches. However, a systematic literature review that thoroughly examines existing work and reduces subjective bias is still lacking. Such a systematic approach helps identify research gaps, recognize common trends across studies, and inform future research directions. In response, this study follows the PRISMA 2020 guidelines and includes 106 peer-reviewed articles. These publications are analyzed based on modalities, collaboration schemes, and key perception tasks. Through a comparative analysis, this review illustrates how different methods address practical issues such as pose errors, temporal latency, communication constraints, domain shifts, heterogeneity, and adversarial attacks. Furthermore, it critically examines evaluation methodologies, highlighting a misalignment between current metrics and CP's fundamental objectives. By delving into all relevant topics in-depth, this review offers valuable insights into challenges, opportunities, and risks, serving as a reference for advancing research in vehicular collaborative perception.
△ Less
Submitted 6 April, 2025;
originally announced April 2025.
-
Challenges and Advancements in Modeling Shock Fronts with Physics-Informed Neural Networks: A Review and Benchmarking Study
Authors:
Jassem Abbasi,
Ameya D. Jagtap,
Ben Moseley,
Aksel Hiorth,
Pål Østebø Andersen
Abstract:
Solving partial differential equations (PDEs) with discontinuous solutions , such as shock waves in multiphase viscous flow in porous media , is critical for a wide range of scientific and engineering applications, as they represent sudden changes in physical quantities. Physics-Informed Neural Networks (PINNs), an approach proposed for solving PDEs, encounter significant challenges when applied t…
▽ More
Solving partial differential equations (PDEs) with discontinuous solutions , such as shock waves in multiphase viscous flow in porous media , is critical for a wide range of scientific and engineering applications, as they represent sudden changes in physical quantities. Physics-Informed Neural Networks (PINNs), an approach proposed for solving PDEs, encounter significant challenges when applied to such systems. Accurately solving PDEs with discontinuities using PINNs requires specialized techniques to ensure effective solution accuracy and numerical stability. A benchmarking study was conducted on two multiphase flow problems in porous media: the classic Buckley-Leverett (BL) problem and a fully coupled system of equations involving shock waves but with varying levels of solution complexity. The findings show that PM and LM approaches can provide accurate solutions for the BL problem by effectively addressing the infinite gradients associated with shock occurrences. In contrast, AM methods failed to effectively resolve the shock waves. When applied to fully coupled PDEs (with more complex loss landscape), the generalization error in the solutions quickly increased, highlighting the need for ongoing innovation. This study provides a comprehensive review of existing techniques for managing PDE discontinuities using PINNs, offering information on their strengths and limitations. The results underscore the necessity for further research to improve PINNs ability to handle complex discontinuities, particularly in more challenging problems with complex loss landscapes. This includes problems involving higher dimensions or multiphysics systems, where current methods often struggle to maintain accuracy and efficiency.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
History-Matching of Imbibition Flow in Multiscale Fractured Porous Media Using Physics-Informed Neural Networks (PINNs)
Authors:
Jassem Abbasi,
Ben Moseley,
Takeshi Kurotori,
Ameya D. Jagtap,
Anthony R. Kovscek,
Aksel Hiorth,
Pål Østebø Andersen
Abstract:
We propose a workflow based on physics-informed neural networks (PINNs) to model multiphase fluid flow in fractured porous media. After validating the workflow in forward and inverse modeling of a synthetic problem of flow in fractured porous media, we applied it to a real experimental dataset in which brine is injected at a constant pressure drop into a CO2 saturated naturally fractured shale cor…
▽ More
We propose a workflow based on physics-informed neural networks (PINNs) to model multiphase fluid flow in fractured porous media. After validating the workflow in forward and inverse modeling of a synthetic problem of flow in fractured porous media, we applied it to a real experimental dataset in which brine is injected at a constant pressure drop into a CO2 saturated naturally fractured shale core plug. The exact spatial positions of natural fractures and the dynamic in-situ distribution of fluids were imaged using a CT-scan setup. To model the targeted system, we followed a domain decomposition approach for matrix and fractures and a multi-network architecture for the separate calculation of water saturation and pressure. The flow equations in the matrix, fractures and interplay between them were solved during training. Prior to fully-coupled simulations, we proposed pre-training the model. This aided in a more efficient and successful training of the coupled system. Both for the synthetic and experimental inverse problems, we determined flow parameters within the matrix and the fractures. Multiple random initializations of network and system parameters were performed to assess the uncertainty and uniqueness of the results. The results confirmed the precision of the inverse calculated parameters in retrieving the main flow characteristics of the system. The consideration of multiscale matrix-fracture impacts is commonly overlooked in existing workflows. Accounting for them led to several orders of magnitude variations in the calculated flow properties compared to not accounting for them. To the best of our knowledge, the proposed PINNs-based workflow is the first to offer a reliable and computationally efficient solution for inverse modeling of multiphase flow in fractured porous media, achieved through history-matching noisy and multi-fidelity experimental measurements.
△ Less
Submitted 30 October, 2024; v1 submitted 28 October, 2024;
originally announced October 2024.
-
ABHINAW: A method for Automatic Evaluation of Typography within AI-Generated Images
Authors:
Abhinaw Jagtap,
Nachiket Tapas,
R. G. Brajesh
Abstract:
In the fast-evolving field of Generative AI, platforms like MidJourney, DALL-E, and Stable Diffusion have transformed Text-to-Image (T2I) Generation. However, despite their impressive ability to create high-quality images, they often struggle to generate accurate text within these images. Theoretically, if we could achieve accurate text generation in AI images in a ``zero-shot'' manner, it would n…
▽ More
In the fast-evolving field of Generative AI, platforms like MidJourney, DALL-E, and Stable Diffusion have transformed Text-to-Image (T2I) Generation. However, despite their impressive ability to create high-quality images, they often struggle to generate accurate text within these images. Theoretically, if we could achieve accurate text generation in AI images in a ``zero-shot'' manner, it would not only make AI-generated images more meaningful but also democratize the graphic design industry. The first step towards this goal is to create a robust scoring matrix for evaluating text accuracy in AI-generated images. Although there are existing bench-marking methods like CLIP SCORE and T2I-CompBench++, there's still a gap in systematically evaluating text and typography in AI-generated images, especially with diffusion-based methods. In this paper, we introduce a novel evaluation matrix designed explicitly for quantifying the performance of text and typography generation within AI-generated images. We have used letter by letter matching strategy to compute the exact matching scores from the reference text to the AI generated text. Our novel approach to calculate the score takes care of multiple redundancies such as repetition of words, case sensitivity, mixing of words, irregular incorporation of letters etc. Moreover, we have developed a Novel method named as brevity adjustment to handle excess text. In addition we have also done a quantitative analysis of frequent errors arise due to frequently used words and less frequently used words. Project page is available at: https://github.com/Abhinaw3906/ABHINAW-MATRIX.
△ Less
Submitted 18 September, 2024;
originally announced September 2024.
-
Large Language Model-Based Evolutionary Optimizer: Reasoning with elitism
Authors:
Shuvayan Brahmachary,
Subodh M. Joshi,
Aniruddha Panda,
Kaushik Koneripalli,
Arun Kumar Sagotra,
Harshil Patel,
Ankush Sharma,
Ameya D. Jagtap,
Kaushic Kalyanaraman
Abstract:
Large Language Models (LLMs) have demonstrated remarkable reasoning abilities, prompting interest in their application as black-box optimizers. This paper asserts that LLMs possess the capability for zero-shot optimization across diverse scenarios, including multi-objective and high-dimensional problems. We introduce a novel population-based method for numerical optimization using LLMs called Lang…
▽ More
Large Language Models (LLMs) have demonstrated remarkable reasoning abilities, prompting interest in their application as black-box optimizers. This paper asserts that LLMs possess the capability for zero-shot optimization across diverse scenarios, including multi-objective and high-dimensional problems. We introduce a novel population-based method for numerical optimization using LLMs called Language-Model-Based Evolutionary Optimizer (LEO). Our hypothesis is supported through numerical examples, spanning benchmark and industrial engineering problems such as supersonic nozzle shape optimization, heat transfer, and windfarm layout optimization. We compare our method to several gradient-based and gradient-free optimization approaches. While LLMs yield comparable results to state-of-the-art methods, their imaginative nature and propensity to hallucinate demand careful handling. We provide practical guidelines for obtaining reliable answers from LLMs and discuss method limitations and potential research directions.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
RiemannONets: Interpretable Neural Operators for Riemann Problems
Authors:
Ahmad Peyvan,
Vivek Oommen,
Ameya D. Jagtap,
George Em Karniadakis
Abstract:
Developing the proper representations for simulating high-speed flows with strong shock waves, rarefactions, and contact discontinuities has been a long-standing question in numerical analysis. Herein, we employ neural operators to solve Riemann problems encountered in compressible flows for extreme pressure jumps (up to $10^{10}$ pressure ratio). In particular, we first consider the DeepONet that…
▽ More
Developing the proper representations for simulating high-speed flows with strong shock waves, rarefactions, and contact discontinuities has been a long-standing question in numerical analysis. Herein, we employ neural operators to solve Riemann problems encountered in compressible flows for extreme pressure jumps (up to $10^{10}$ pressure ratio). In particular, we first consider the DeepONet that we train in a two-stage process, following the recent work of \cite{lee2023training}, wherein the first stage, a basis is extracted from the trunk net, which is orthonormalized and subsequently is used in the second stage in training the branch net. This simple modification of DeepONet has a profound effect on its accuracy, efficiency, and robustness and leads to very accurate solutions to Riemann problems compared to the vanilla version. It also enables us to interpret the results physically as the hierarchical data-driven produced basis reflects all the flow features that would otherwise be introduced using ad hoc feature expansion layers. We also compare the results with another neural operator based on the U-Net for low, intermediate, and very high-pressure ratios that are very accurate for Riemann problems, especially for large pressure ratios, due to their multiscale nature but computationally more expensive. Overall, our study demonstrates that simple neural network architectures, if properly pre-trained, can achieve very accurate solutions of Riemann problems for real-time forecasting. The source code, along with its corresponding data, can be found at the following URL: https://github.com/apey236/RiemannONet/tree/main
△ Less
Submitted 16 April, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Heart Rate Detection Using an Event Camera
Authors:
Aniket Jagtap,
RamaKrishna Venkatesh Saripalli,
Joe Lemley,
Waseem Shariff,
Alan F. Smeaton
Abstract:
Event cameras, also known as neuromorphic cameras, are an emerging technology that offer advantages over traditional shutter and frame-based cameras, including high temporal resolution, low power consumption, and selective data acquisition. In this study, we propose to harnesses the capabilities of event-based cameras to capture subtle changes in the surface of the skin caused by the pulsatile flo…
▽ More
Event cameras, also known as neuromorphic cameras, are an emerging technology that offer advantages over traditional shutter and frame-based cameras, including high temporal resolution, low power consumption, and selective data acquisition. In this study, we propose to harnesses the capabilities of event-based cameras to capture subtle changes in the surface of the skin caused by the pulsatile flow of blood in the wrist region. We investigate whether an event camera could be used for continuous noninvasive monitoring of heart rate (HR). Event camera video data from 25 participants, comprising varying age groups and skin colours, was collected and analysed. Ground-truth HR measurements obtained using conventional methods were used to evaluate of the accuracy of automatic detection of HR from event camera data. Our experimental results and comparison to the performance of other non-contact HR measurement methods demonstrate the feasibility of using event cameras for pulse detection. We also acknowledge the challenges and limitations of our method, such as light-induced flickering and the sub-conscious but naturally-occurring tremors of an individual during data capture.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Deep smoothness WENO scheme for two-dimensional hyperbolic conservation laws: A deep learning approach for learning smoothness indicators
Authors:
Tatiana Kossaczká,
Ameya D. Jagtap,
Matthias Ehrhardt
Abstract:
In this paper, we introduce an improved version of the fifth-order weighted essentially non-oscillatory (WENO) shock-capturing scheme by incorporating deep learning techniques. The established WENO algorithm is improved by training a compact neural network to adjust the smoothness indicators within the WENO scheme. This modification enhances the accuracy of the numerical results, particularly near…
▽ More
In this paper, we introduce an improved version of the fifth-order weighted essentially non-oscillatory (WENO) shock-capturing scheme by incorporating deep learning techniques. The established WENO algorithm is improved by training a compact neural network to adjust the smoothness indicators within the WENO scheme. This modification enhances the accuracy of the numerical results, particularly near abrupt shocks. Unlike previous deep learning-based methods, no additional post-processing steps are necessary for maintaining consistency. We demonstrate the superiority of our new approach using several examples from the literature for the two-dimensional Euler equations of gas dynamics. Through intensive study of these test problems, which involve various shocks and rarefaction waves, the new technique is shown to outperform traditional fifth-order WENO schemes, especially in cases where the numerical solutions exhibit excessive diffusion or overshoot around shocks.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
A unified scalable framework for causal sweeping strategies for Physics-Informed Neural Networks (PINNs) and their temporal decompositions
Authors:
Michael Penwarden,
Ameya D. Jagtap,
Shandian Zhe,
George Em Karniadakis,
Robert M. Kirby
Abstract:
Physics-informed neural networks (PINNs) as a means of solving partial differential equations (PDE) have garnered much attention in the Computational Science and Engineering (CS&E) world. However, a recent topic of interest is exploring various training (i.e., optimization) challenges - in particular, arriving at poor local minima in the optimization landscape results in a PINN approximation givin…
▽ More
Physics-informed neural networks (PINNs) as a means of solving partial differential equations (PDE) have garnered much attention in the Computational Science and Engineering (CS&E) world. However, a recent topic of interest is exploring various training (i.e., optimization) challenges - in particular, arriving at poor local minima in the optimization landscape results in a PINN approximation giving an inferior, and sometimes trivial, solution when solving forward time-dependent PDEs with no data. This problem is also found in, and in some sense more difficult, with domain decomposition strategies such as temporal decomposition using XPINNs. We furnish examples and explanations for different training challenges, their cause, and how they relate to information propagation and temporal decomposition. We then propose a new stacked-decomposition method that bridges the gap between time-marching PINNs and XPINNs. We also introduce significant computational speed-ups by using transfer learning concepts to initialize subnetworks in the domain and loss tolerance-based propagation for the subdomains. Finally, we formulate a new time-sweeping collocation point algorithm inspired by the previous PINNs causality literature, which our framework can still describe, and provides a significant computational speed-up via reduced-cost collocation point segmentation. The proposed methods form our unified framework, which overcomes training challenges in PINNs and XPINNs for time-dependent PDEs by respecting the causality in multiple forms and improving scalability by limiting the computation required per optimization iteration. Finally, we provide numerical results for these methods on baseline PDE problems for which unmodified PINNs and XPINNs struggle to train.
△ Less
Submitted 18 September, 2023; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Learning stiff chemical kinetics using extended deep neural operators
Authors:
Somdatta Goswami,
Ameya D. Jagtap,
Hessam Babaee,
Bryan T. Susi,
George Em Karniadakis
Abstract:
We utilize neural operators to learn the solution propagator for the challenging chemical kinetics equation. Specifically, we apply the deep operator network (DeepONet) along with its extensions, such as the autoencoder-based DeepONet and the newly proposed Partition-of-Unity (PoU-) DeepONet to study a range of examples, including the ROBERS problem with three species, the POLLU problem with 25 sp…
▽ More
We utilize neural operators to learn the solution propagator for the challenging chemical kinetics equation. Specifically, we apply the deep operator network (DeepONet) along with its extensions, such as the autoencoder-based DeepONet and the newly proposed Partition-of-Unity (PoU-) DeepONet to study a range of examples, including the ROBERS problem with three species, the POLLU problem with 25 species, pure kinetics of the syngas skeletal model for $CO/H_2$ burning, which contains 11 species and 21 reactions and finally, a temporally developing planar $CO/H_2$ jet flame (turbulent flame) using the same syngas mechanism. We have demonstrated the advantages of the proposed approach through these numerical examples. Specifically, to train the DeepONet for the syngas model, we solve the skeletal kinetic model for different initial conditions. In the first case, we parametrize the initial conditions based on equivalence ratios and initial temperature values. In the second case, we perform a direct numerical simulation of a two-dimensional temporally developing $CO/H_2$ jet flame. Then, we initialize the kinetic model by the thermochemical states visited by a subset of grid points at different time snapshots. Stiff problems are computationally expensive to solve with traditional stiff solvers. Thus, this work aims to develop a neural operator-based surrogate model to solve stiff chemical kinetics. The operator, once trained offline, can accurately integrate the thermochemical state for arbitrarily large time advancements, leading to significant computational gains compared to stiff integration schemes.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.
-
Augmented Physics-Informed Neural Networks (APINNs): A gating network-based soft domain decomposition methodology
Authors:
Zheyuan Hu,
Ameya D. Jagtap,
George Em Karniadakis,
Kenji Kawaguchi
Abstract:
In this paper, we propose the augmented physics-informed neural network (APINN), which adopts soft and trainable domain decomposition and flexible parameter sharing to further improve the extended PINN (XPINN) as well as the vanilla PINN methods. In particular, a trainable gate network is employed to mimic the hard decomposition of XPINN, which can be flexibly fine-tuned for discovering a potentia…
▽ More
In this paper, we propose the augmented physics-informed neural network (APINN), which adopts soft and trainable domain decomposition and flexible parameter sharing to further improve the extended PINN (XPINN) as well as the vanilla PINN methods. In particular, a trainable gate network is employed to mimic the hard decomposition of XPINN, which can be flexibly fine-tuned for discovering a potentially better partition. It weight-averages several sub-nets as the output of APINN. APINN does not require complex interface conditions, and its sub-nets can take advantage of all training samples rather than just part of the training data in their subdomains. Lastly, each sub-net shares part of the common parameters to capture the similar components in each decomposed function. Furthermore, following the PINN generalization theory in Hu et al. [2021], we show that APINN can improve generalization by proper gate network initialization and general domain & function decomposition. Extensive experiments on different types of PDEs demonstrate how APINN improves the PINN and XPINN methods. Specifically, we present examples where XPINN performs similarly to or worse than PINN, so that APINN can significantly improve both. We also show cases where XPINN is already better than PINN, so APINN can still slightly improve XPINN. Furthermore, we visualize the optimized gating networks and their optimization trajectories, and connect them with their performance, which helps discover the possibly optimal decomposition. Interestingly, if initialized by different decomposition, the performances of corresponding APINNs can differ drastically. This, in turn, shows the potential to design an optimal domain decomposition for the differential equation problem under consideration.
△ Less
Submitted 29 September, 2023; v1 submitted 16 November, 2022;
originally announced November 2022.
-
How important are activation functions in regression and classification? A survey, performance comparison, and future directions
Authors:
Ameya D. Jagtap,
George Em Karniadakis
Abstract:
Inspired by biological neurons, the activation functions play an essential part in the learning process of any artificial neural network commonly used in many real-world problems. Various activation functions have been proposed in the literature for classification as well as regression tasks. In this work, we survey the activation functions that have been employed in the past as well as the curren…
▽ More
Inspired by biological neurons, the activation functions play an essential part in the learning process of any artificial neural network commonly used in many real-world problems. Various activation functions have been proposed in the literature for classification as well as regression tasks. In this work, we survey the activation functions that have been employed in the past as well as the current state-of-the-art. In particular, we present various developments in activation functions over the years and the advantages as well as disadvantages or limitations of these activation functions. We also discuss classical (fixed) activation functions, including rectifier units, and adaptive activation functions. In addition to discussing the taxonomy of activation functions based on characterization, a taxonomy of activation functions based on applications is presented. To this end, the systematic comparison of various fixed and adaptive activation functions is performed for classification data sets such as the MNIST, CIFAR-10, and CIFAR- 100. In recent years, a physics-informed machine learning framework has emerged for solving problems related to scientific computations. For this purpose, we also discuss various requirements for activation functions that have been used in the physics-informed machine learning framework. Furthermore, various comparisons are made among different fixed and adaptive activation functions using various machine learning libraries such as TensorFlow, Pytorch, and JAX.
△ Less
Submitted 28 December, 2022; v1 submitted 6 September, 2022;
originally announced September 2022.
-
Automatic Generation of Synthetic Colonoscopy Videos for Domain Randomization
Authors:
Abhishek Dinkar Jagtap,
Mattias Heinrich,
Marian Himstedt
Abstract:
An increasing number of colonoscopic guidance and assistance systems rely on machine learning algorithms which require a large amount of high-quality training data. In order to ensure high performance, the latter has to resemble a substantial portion of possible configurations. This particularly addresses varying anatomy, mucosa appearance and image sensor characteristics which are likely deterior…
▽ More
An increasing number of colonoscopic guidance and assistance systems rely on machine learning algorithms which require a large amount of high-quality training data. In order to ensure high performance, the latter has to resemble a substantial portion of possible configurations. This particularly addresses varying anatomy, mucosa appearance and image sensor characteristics which are likely deteriorated by motion blur and inadequate illumination. The limited amount of readily available training data hampers to account for all of these possible configurations which results in reduced generalization capabilities of machine learning models. We propose an exemplary solution for synthesizing colonoscopy videos with substantial appearance and anatomical variations which enables to learn discriminative domain-randomized representations of the interior colon while mimicking real-world settings.
△ Less
Submitted 20 May, 2022;
originally announced May 2022.
-
Error estimates for physics informed neural networks approximating the Navier-Stokes equations
Authors:
Tim De Ryck,
Ameya D. Jagtap,
Siddhartha Mishra
Abstract:
We prove rigorous bounds on the errors resulting from the approximation of the incompressible Navier-Stokes equations with (extended) physics informed neural networks. We show that the underlying PDE residual can be made arbitrarily small for tanh neural networks with two hidden layers. Moreover, the total error can be estimated in terms of the training error, network size and number of quadrature…
▽ More
We prove rigorous bounds on the errors resulting from the approximation of the incompressible Navier-Stokes equations with (extended) physics informed neural networks. We show that the underlying PDE residual can be made arbitrarily small for tanh neural networks with two hidden layers. Moreover, the total error can be estimated in terms of the training error, network size and number of quadrature points. The theory is illustrated with numerical experiments.
△ Less
Submitted 2 February, 2023; v1 submitted 17 March, 2022;
originally announced March 2022.
-
Physics-informed neural networks for inverse problems in supersonic flows
Authors:
Ameya D. Jagtap,
Zhiping Mao,
Nikolaus Adams,
George Em Karniadakis
Abstract:
Accurate solutions to inverse supersonic compressible flow problems are often required for designing specialized aerospace vehicles. In particular, we consider the problem where we have data available for density gradients from Schlieren photography as well as data at the inflow and part of wall boundaries. These inverse problems are notoriously difficult and traditional methods may not be adequat…
▽ More
Accurate solutions to inverse supersonic compressible flow problems are often required for designing specialized aerospace vehicles. In particular, we consider the problem where we have data available for density gradients from Schlieren photography as well as data at the inflow and part of wall boundaries. These inverse problems are notoriously difficult and traditional methods may not be adequate to solve such ill-posed inverse problems. To this end, we employ the physics-informed neural networks (PINNs) and its extended version, extended PINNs (XPINNs), where domain decomposition allows deploying locally powerful neural networks in each subdomain, which can provide additional expressivity in subdomains, where a complex solution is expected. Apart from the governing compressible Euler equations, we also enforce the entropy conditions in order to obtain viscosity solutions. Moreover, we enforce positivity conditions on density and pressure. We consider inverse problems involving two-dimensional expansion waves, two-dimensional oblique and bow shock waves. We compare solutions obtained by PINNs and XPINNs and invoke some theoretical results that can be used to decide on the generalization errors of the two methods.
△ Less
Submitted 23 February, 2022;
originally announced February 2022.
-
When Do Extended Physics-Informed Neural Networks (XPINNs) Improve Generalization?
Authors:
Zheyuan Hu,
Ameya D. Jagtap,
George Em Karniadakis,
Kenji Kawaguchi
Abstract:
Physics-informed neural networks (PINNs) have become a popular choice for solving high-dimensional partial differential equations (PDEs) due to their excellent approximation power and generalization ability. Recently, Extended PINNs (XPINNs) based on domain decomposition methods have attracted considerable attention due to their effectiveness in modeling multiscale and multiphysics problems and th…
▽ More
Physics-informed neural networks (PINNs) have become a popular choice for solving high-dimensional partial differential equations (PDEs) due to their excellent approximation power and generalization ability. Recently, Extended PINNs (XPINNs) based on domain decomposition methods have attracted considerable attention due to their effectiveness in modeling multiscale and multiphysics problems and their parallelization. However, theoretical understanding on their convergence and generalization properties remains unexplored. In this study, we take an initial step towards understanding how and when XPINNs outperform PINNs. Specifically, for general multi-layer PINNs and XPINNs, we first provide a prior generalization bound via the complexity of the target functions in the PDE problem, and a posterior generalization bound via the posterior matrix norms of the networks after optimization. Moreover, based on our bounds, we analyze the conditions under which XPINNs improve generalization. Concretely, our theory shows that the key building block of XPINN, namely the domain decomposition, introduces a tradeoff for generalization. On the one hand, XPINNs decompose the complex PDE solution into several simple parts, which decreases the complexity needed to learn each part and boosts generalization. On the other hand, decomposition leads to less training data being available in each subdomain, and hence such model is typically prone to overfitting and may become less generalizable. Empirically, we choose five PDEs to show when XPINNs perform better than, similar to, or worse than PINNs, hence demonstrating and justifying our new theory.
△ Less
Submitted 18 October, 2022; v1 submitted 20 September, 2021;
originally announced September 2021.
-
Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions
Authors:
Ameya D. Jagtap,
Yeonjong Shin,
Kenji Kawaguchi,
George Em Karniadakis
Abstract:
We propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions. KNNs employ the Kronecker product, which provides an efficient way of constructing a very wide network while keeping the number of parameters low. Our theoretical analysis reveals that under suitable conditions, KNNs induce a faster decay…
▽ More
We propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions. KNNs employ the Kronecker product, which provides an efficient way of constructing a very wide network while keeping the number of parameters low. Our theoretical analysis reveals that under suitable conditions, KNNs induce a faster decay of the loss than that by the feed-forward networks. This is also empirically verified through a set of computational examples. Furthermore, under certain technical assumptions, we establish global convergence of gradient descent for KNNs. As a specific case, we propose the Rowdy activation function that is designed to get rid of any saturation region by injecting sinusoidal fluctuations, which include trainable parameters. The proposed Rowdy activation function can be employed in any neural network architecture like feed-forward neural networks, Recurrent neural networks, Convolutional neural networks etc. The effectiveness of KNNs with Rowdy activation is demonstrated through various computational experiments including function approximation using feed-forward neural networks, solution inference of partial differential equations using the physics-informed neural networks, and standard deep learning benchmark problems using convolutional and fully-connected neural networks.
△ Less
Submitted 19 October, 2021; v1 submitted 20 May, 2021;
originally announced May 2021.
-
Parallel Physics-Informed Neural Networks via Domain Decomposition
Authors:
Khemraj Shukla,
Ameya D. Jagtap,
George Em Karniadakis
Abstract:
We develop a distributed framework for the physics-informed neural networks (PINNs) based on two recent extensions, namely conservative PINNs (cPINNs) and extended PINNs (XPINNs), which employ domain decomposition in space and in time-space, respectively. This domain decomposition endows cPINNs and XPINNs with several advantages over the vanilla PINNs, such as parallelization capacity, large repre…
▽ More
We develop a distributed framework for the physics-informed neural networks (PINNs) based on two recent extensions, namely conservative PINNs (cPINNs) and extended PINNs (XPINNs), which employ domain decomposition in space and in time-space, respectively. This domain decomposition endows cPINNs and XPINNs with several advantages over the vanilla PINNs, such as parallelization capacity, large representation capacity, efficient hyperparameter tuning, and is particularly effective for multi-scale and multi-physics problems. Here, we present a parallel algorithm for cPINNs and XPINNs constructed with a hybrid programming model described by MPI $+$ X, where X $\in \{\text{CPUs},~\text{GPUs}\}$. The main advantage of cPINN and XPINN over the more classical data and model parallel approaches is the flexibility of optimizing all hyperparameters of each neural network separately in each subdomain. We compare the performance of distributed cPINNs and XPINNs for various forward problems, using both weak and strong scalings. Our results indicate that for space domain decomposition, cPINNs are more efficient in terms of communication cost but XPINNs provide greater flexibility as they can also handle time-domain decomposition for any differential equations, and can deal with any arbitrarily shaped complex subdomains. To this end, we also present an application of the parallel XPINN method for solving an inverse diffusion problem with variable conductivity on the United States map, using ten regions as subdomains.
△ Less
Submitted 8 September, 2021; v1 submitted 20 April, 2021;
originally announced April 2021.
-
Locally adaptive activation functions with slope recovery term for deep and physics-informed neural networks
Authors:
Ameya D. Jagtap,
Kenji Kawaguchi,
George Em Karniadakis
Abstract:
We propose two approaches of locally adaptive activation functions namely, layer-wise and neuron-wise locally adaptive activation functions, which improve the performance of deep and physics-informed neural networks. The local adaptation of activation function is achieved by introducing a scalable parameter in each layer (layer-wise) and for every neuron (neuron-wise) separately, and then optimizi…
▽ More
We propose two approaches of locally adaptive activation functions namely, layer-wise and neuron-wise locally adaptive activation functions, which improve the performance of deep and physics-informed neural networks. The local adaptation of activation function is achieved by introducing a scalable parameter in each layer (layer-wise) and for every neuron (neuron-wise) separately, and then optimizing it using a variant of stochastic gradient descent algorithm. In order to further increase the training speed, an activation slope based slope recovery term is added in the loss function, which further accelerates convergence, thereby reducing the training cost. On the theoretical side, we prove that in the proposed method, the gradient descent algorithms are not attracted to sub-optimal critical points or local minima under practical conditions on the initialization and learning rate, and that the gradient dynamics of the proposed method is not achievable by base methods with any (adaptive) learning rates. We further show that the adaptive activation methods accelerate the convergence by implicitly multiplying conditioning matrices to the gradient of the base method without any explicit computation of the conditioning matrix and the matrix-vector product. The different adaptive activation functions are shown to induce different implicit conditioning matrices. Furthermore, the proposed methods with the slope recovery are shown to accelerate the training process.
△ Less
Submitted 17 June, 2020; v1 submitted 25 September, 2019;
originally announced September 2019.
-
Survival of the Fittest in PlayerUnknown BattleGround
Authors:
Brij Rokad,
Tushar Karumudi,
Omkar Acharya,
Akshay Jagtap
Abstract:
The goal of this paper was to predict the placement in the multiplayer game PUBG (playerunknown battleground). In the game, up to one hundred players parachutes onto an island and scavenge for weapons and equipment to kill others, while avoiding getting killed themselves. The available safe area of the game map decreases in size over time, directing surviving players into tighter areas to force en…
▽ More
The goal of this paper was to predict the placement in the multiplayer game PUBG (playerunknown battleground). In the game, up to one hundred players parachutes onto an island and scavenge for weapons and equipment to kill others, while avoiding getting killed themselves. The available safe area of the game map decreases in size over time, directing surviving players into tighter areas to force encounters. The last player or team standing wins the round. In this paper specifically, we have tried to predict the placement of the player in the ultimate survival test. The data set has been taken from Kaggle. Entire dataset has 29 attributes which are categories to 1 label(winPlacePerc), training set has 4.5 million instances and testing set has 1.9 million. winPlacePerc is continuous category, which makes it harder to predict the survival of the fittest. To overcome this problem, we have applied multiple machine learning models to find the optimum prediction. Model consists of LightGBM Regression (Light Gradient Boosting Machine Regression), MultiLayer Perceptron, M5P (improvement on C4.5) and Random Forest. To measure the error rate, Mean Absolute Error has been used. With the final prediction we have achieved MAE of 0.02047, 0.065, 0.0592 and 0634 respectively.
△ Less
Submitted 15 May, 2019;
originally announced May 2019.
-
Secure Video Streaming Plug-In
Authors:
Avinash Bhujbal,
Ashish Jagtap,
Devendra Gurav,
Tino Jameskutty
Abstract:
Video sharing sites like YouTube, Metacafe, Dailymotion, Vimeo, etc. provide a platform for media content sharing among its users. Some of these videos are copyright protected and restricted from being downloaded and saved. But users can use various download managers or application programs to download and save these videos. This affects the incoming traffic on these websites reducing their hit ra…
▽ More
Video sharing sites like YouTube, Metacafe, Dailymotion, Vimeo, etc. provide a platform for media content sharing among its users. Some of these videos are copyright protected and restricted from being downloaded and saved. But users can use various download managers or application programs to download and save these videos. This affects the incoming traffic on these websites reducing their hit rate and consequently reducing their revenue. Adobe Flash Player is the most commonly used player for watching online videos. It uses RTMP (Real Time Messaging Protocol) to stream audio, video and data over the Internet, between a Flash Player and Adobe Flash Media Server.Here, we propose a plug-in that enables the site owner control over downloading of videos from such website. The plug-in will be installed at the client side with the consent of the user. When the video is being played this plug-in will send unique keys to the media server. The server will continue streaming the video after verifying the keys. Download managers or application programs will not be able to download the videos as they wont be able to create the unique keys that need to be sent to the server.
△ Less
Submitted 7 March, 2013;
originally announced March 2013.