Search | arXiv e-print repository

Regional climate risk assessment from climate models using probabilistic machine learning

Authors: Zhong Yi Wan, Ignacio Lopez-Gomez, Robert Carver, Tapio Schneider, John Anderson, Fei Sha, Leonardo Zepeda-Núñez

Abstract: Accurate, actionable climate information at km scales is crucial for robust natural hazard risk assessment and infrastructure planning. Simulating climate at these resolutions remains intractable, forcing reliance on downscaling: either physics-based or statistical methods that transform climate simulations from coarse to impact-relevant resolutions. One major challenge for downscaling is to compr… ▽ More Accurate, actionable climate information at km scales is crucial for robust natural hazard risk assessment and infrastructure planning. Simulating climate at these resolutions remains intractable, forcing reliance on downscaling: either physics-based or statistical methods that transform climate simulations from coarse to impact-relevant resolutions. One major challenge for downscaling is to comprehensively capture the interdependency among climate processes of interest, a prerequisite for representing climate hazards. However, current approaches either lack the desired scalability or are bespoke to specific types of hazards. We introduce GenFocal, a computationally efficient, general-purpose, end-to-end generative framework that gives rise to full probabilistic characterizations of complex climate processes interacting at fine spatiotemporal scales. GenFocal more accurately assesses extreme risk in the current climate than leading approaches, including one used in the US 5th National Climate Assessment. It produces plausible tracks of tropical cyclones, providing accurate statistics of their genesis and evolution, even when they are absent from the corresponding climate simulations. GenFocal also shows compelling results that are consistent with the literature on projecting climate impact on decadal timescales. GenFocal revolutionizes how climate simulations can be efficiently augmented with observations and harnessed to enable future climate impact assessments at the spatiotemporal scales relevant to local and regional communities. We believe this work establishes genAI as an effective paradigm for modeling complex, high-dimensional multivariate statistical correlations that have deterred precise quantification of climate risks associated with hazards such as wildfires, extreme heat, tropical cyclones, and flooding; thereby enabling the evaluation of adaptation strategies. △ Less

Submitted 16 June, 2025; v1 submitted 10 December, 2024; originally announced December 2024.

arXiv:2410.01776 [pdf, other]

Dynamical-generative downscaling of climate model ensembles

Authors: Ignacio Lopez-Gomez, Zhong Yi Wan, Leonardo Zepeda-Núñez, Tapio Schneider, John Anderson, Fei Sha

Abstract: Regional high-resolution climate projections are crucial for many applications, such as agriculture, hydrology, and natural hazard risk assessment. Dynamical downscaling, the state-of-the-art method to produce localized future climate information, involves running a regional climate model (RCM) driven by an Earth System Model (ESM), but it is too computationally expensive to apply to large climate… ▽ More Regional high-resolution climate projections are crucial for many applications, such as agriculture, hydrology, and natural hazard risk assessment. Dynamical downscaling, the state-of-the-art method to produce localized future climate information, involves running a regional climate model (RCM) driven by an Earth System Model (ESM), but it is too computationally expensive to apply to large climate projection ensembles. We propose a novel approach combining dynamical downscaling with generative artificial intelligence to reduce the cost and improve the uncertainty estimates of downscaled climate projections. In our framework, an RCM dynamically downscales ESM output to an intermediate resolution, followed by a generative diffusion model that further refines the resolution to the target scale. This approach leverages the generalizability of physics-based models and the sampling efficiency of diffusion models, enabling the downscaling of large multi-model ensembles. We evaluate our method against dynamically-downscaled climate projections from the CMIP6 ensemble. Our results demonstrate its ability to provide more accurate uncertainty bounds on future regional climate than alternatives such as dynamical downscaling of smaller ensembles, or traditional empirical statistical downscaling methods. We also show that dynamical-generative downscaling results in significantly lower errors than bias correction and spatial disaggregation (BCSD), and captures more accurately the spectra and multivariate correlations of meteorological fields. These characteristics make the dynamical-generative framework a flexible, accurate, and efficient way to downscale large ensembles of climate projections, currently out of reach for pure dynamical downscaling. △ Less

Submitted 2 October, 2024; originally announced October 2024.

arXiv:2409.18359 [pdf, other]

Generative AI for fast and accurate statistical computation of fluids

Authors: Roberto Molinaro, Samuel Lanthaler, Bogdan Raonić, Tobias Rohner, Victor Armegioiu, Stephan Simonis, Dana Grund, Yannick Ramic, Zhong Yi Wan, Fei Sha, Siddhartha Mishra, Leonardo Zepeda-Núñez

Abstract: We present a generative AI algorithm for addressing the pressing task of fast, accurate, and robust statistical computation of three-dimensional turbulent fluid flows. Our algorithm, termed as GenCFD, is based on an end-to-end conditional score-based diffusion model. Through extensive numerical experimentation with a set of challenging fluid flows, we demonstrate that GenCFD provides an accurate a… ▽ More We present a generative AI algorithm for addressing the pressing task of fast, accurate, and robust statistical computation of three-dimensional turbulent fluid flows. Our algorithm, termed as GenCFD, is based on an end-to-end conditional score-based diffusion model. Through extensive numerical experimentation with a set of challenging fluid flows, we demonstrate that GenCFD provides an accurate approximation of relevant statistical quantities of interest while also efficiently generating high-quality realistic samples of turbulent fluid flows and ensuring excellent spectral resolution. In contrast, ensembles of deterministic ML algorithms, trained to minimize mean square errors, regress to the mean flow. We present rigorous theoretical results uncovering the surprising mechanisms through which diffusion models accurately generate fluid flows. These mechanisms are illustrated with solvable toy models that exhibit the mathematically relevant features of turbulent fluid flows while being amenable to explicit analytical formulae. Our codes are publicly available at https://github.com/camlab-ethz/GenCFD. △ Less

Submitted 2 February, 2025; v1 submitted 26 September, 2024; originally announced September 2024.

Comments: 120 pages, 33 figures

arXiv:2409.09217 [pdf, other]

Rational-WENO: A lightweight, physically-consistent three-point weighted essentially non-oscillatory scheme

Authors: Shantanu Shahane, Sheide Chammas, Deniz A. Bezgin, Aaron B. Buhendwa, Steffen J. Schmidt, Nikolaus A. Adams, Spencer H. Bryngelson, Yi-Fan Chen, Qing Wang, Fei Sha, Leonardo Zepeda-Núñez

Abstract: Conventional WENO3 methods are known to be highly dissipative at lower resolutions, introducing significant errors in the pre-asymptotic regime. In this paper, we employ a rational neural network to accurately estimate the local smoothness of the solution, dynamically adapting the stencil weights based on local solution features. As rational neural networks can represent fast transitions between s… ▽ More Conventional WENO3 methods are known to be highly dissipative at lower resolutions, introducing significant errors in the pre-asymptotic regime. In this paper, we employ a rational neural network to accurately estimate the local smoothness of the solution, dynamically adapting the stencil weights based on local solution features. As rational neural networks can represent fast transitions between smooth and sharp regimes, this approach achieves a granular reconstruction with significantly reduced dissipation, improving the accuracy of the simulation. The network is trained offline on a carefully chosen dataset of analytical functions, bypassing the need for differentiable solvers. We also propose a robust model selection criterion based on estimates of the interpolation's convergence order on a set of test functions, which correlates better with the model performance in downstream tasks. We demonstrate the effectiveness of our approach on several one-, two-, and three-dimensional fluid flow problems: our scheme generalizes across grid resolutions while handling smooth and discontinuous solutions. In most cases, our rational network-based scheme achieves higher accuracy than conventional WENO3 with the same stencil size, and in a few of them, it achieves accuracy comparable to WENO5, which uses a larger stencil. △ Less

Submitted 13 September, 2024; originally announced September 2024.

arXiv:2408.02866 [pdf, other]

doi 10.1016/j.cma.2025.118036

Back-Projection Diffusion: Solving the Wideband Inverse Scattering Problem with Diffusion Models

Authors: Borong Zhang, Martín Guerra, Qin Li, Leonardo Zepeda-Núñez

Abstract: We present Wideband Back-Projection Diffusion, an end-to-end probabilistic framework for approximating the posterior distribution induced by the inverse scattering map from wideband scattering data. This framework produces highly accurate reconstructions, leveraging conditional diffusion models to draw samples, and also honors the symmetries of the underlying physics of wave-propagation. The proce… ▽ More We present Wideband Back-Projection Diffusion, an end-to-end probabilistic framework for approximating the posterior distribution induced by the inverse scattering map from wideband scattering data. This framework produces highly accurate reconstructions, leveraging conditional diffusion models to draw samples, and also honors the symmetries of the underlying physics of wave-propagation. The procedure is factored into two steps: the first step, inspired by the filtered back-propagation formula, transforms data into a physics-based latent representation, while the second step learns a conditional score function conditioned on this latent representation. These two steps individually obey their associated symmetries and are amenable to compression by imposing the rank structure found in the filtered back-projection formula. Empirically, our framework has both low sample and computational complexity, with its number of parameters scaling only sub-linearly with the target resolution, and has stable training dynamics. It provides sharp reconstructions effortlessly and is capable of recovering even sub-Nyquist features in the multiple-scattering regime. △ Less

Submitted 17 May, 2025; v1 submitted 5 August, 2024; originally announced August 2024.

Comments: 14 pages, 8 figures; published in Computer Methods in Applied Mechanics and Engineering

Journal ref: Computer Methods in Applied Mechanics and Engineering 443 (2025) 118036 Computer Methods in Applied Mechanics and Engineering 443 (2025) 118036

arXiv:2408.02688 [pdf, other]

A probabilistic framework for learning non-intrusive corrections to long-time climate simulations from short-time training data

Authors: Benedikt Barthel Sorensen, Leonardo Zepeda-Núñez, Ignacio Lopez-Gomez, Zhong Yi Wan, Rob Carver, Fei Sha, Themistoklis Sapsis

Abstract: Chaotic systems, such as turbulent flows, are ubiquitous in science and engineering. However, their study remains a challenge due to the large range scales, and the strong interaction with other, often not fully understood, physics. As a consequence, the spatiotemporal resolution required for accurate simulation of these systems is typically computationally infeasible, particularly for application… ▽ More Chaotic systems, such as turbulent flows, are ubiquitous in science and engineering. However, their study remains a challenge due to the large range scales, and the strong interaction with other, often not fully understood, physics. As a consequence, the spatiotemporal resolution required for accurate simulation of these systems is typically computationally infeasible, particularly for applications of long-term risk assessment, such as the quantification of extreme weather risk due to climate change. While data-driven modeling offers some promise of alleviating these obstacles, the scarcity of high-quality simulations results in limited available data to train such models, which is often compounded by the lack of stability for long-horizon simulations. As such, the computational, algorithmic, and data restrictions generally imply that the probability of rare extreme events is not accurately captured. In this work we present a general strategy for training neural network models to non-intrusively correct under-resolved long-time simulations of chaotic systems. The approach is based on training a post-processing correction operator on under-resolved simulations nudged towards a high-fidelity reference. This enables us to learn the dynamics of the underlying system directly, which allows us to use very little training data, even when the statistics thereof are far from converged. Additionally, through the use of probabilistic network architectures we are able to leverage the uncertainty due to the limited training data to further improve extrapolation capabilities. We apply our framework to severely under-resolved simulations of quasi-geostrophic flow and demonstrate its ability to accurately predict the anisotropic statistics over time horizons more than 30 times longer than the data seen in training. △ Less

Submitted 22 November, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

arXiv:2402.04467 [pdf, other]

DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic Systems

Authors: Yair Schiff, Zhong Yi Wan, Jeffrey B. Parker, Stephan Hoyer, Volodymyr Kuleshov, Fei Sha, Leonardo Zepeda-Núñez

Abstract: Learning dynamics from dissipative chaotic systems is notoriously difficult due to their inherent instability, as formalized by their positive Lyapunov exponents, which exponentially amplify errors in the learned dynamics. However, many of these systems exhibit ergodicity and an attractor: a compact and highly complex manifold, to which trajectories converge in finite-time, that supports an invari… ▽ More Learning dynamics from dissipative chaotic systems is notoriously difficult due to their inherent instability, as formalized by their positive Lyapunov exponents, which exponentially amplify errors in the learned dynamics. However, many of these systems exhibit ergodicity and an attractor: a compact and highly complex manifold, to which trajectories converge in finite-time, that supports an invariant measure, i.e., a probability distribution that is invariant under the action of the dynamics, which dictates the long-term statistical behavior of the system. In this work, we leverage this structure to propose a new framework that targets learning the invariant measure as well as the dynamics, in contrast with typical methods that only target the misfit between trajectories, which often leads to divergence as the trajectories' length increases. We use our framework to propose a tractable and sample efficient objective that can be used with any existing learning objectives. Our Dynamics Stable Learning by Invariant Measure (DySLIM) objective enables model training that achieves better point-wise tracking and long-term statistical accuracy relative to other learning objectives. By targeting the distribution with a scalable regularization term, we hope that this approach can be extended to more complex systems exhibiting slowly-variant distributions, such as weather and climate models. △ Less

Submitted 5 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: ICML 2024; Code to reproduce our experiments is available at https://github.com/google-research/swirl-dynamics/tree/main/swirl_dynamics/projects/ergodic

arXiv:2306.07526 [pdf, other]

User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems

Authors: Marc Finzi, Anudhyan Boral, Andrew Gordon Wilson, Fei Sha, Leonardo Zepeda-Núñez

Abstract: Diffusion models are a class of probabilistic generative models that have been widely used as a prior for image processing tasks like text conditional generation and inpainting. We demonstrate that these models can be adapted to make predictions and provide uncertainty quantification for chaotic dynamical systems. In these applications, diffusion models can implicitly represent knowledge about out… ▽ More Diffusion models are a class of probabilistic generative models that have been widely used as a prior for image processing tasks like text conditional generation and inpainting. We demonstrate that these models can be adapted to make predictions and provide uncertainty quantification for chaotic dynamical systems. In these applications, diffusion models can implicitly represent knowledge about outliers and extreme events; however, querying that knowledge through conditional sampling or measuring probabilities is surprisingly difficult. Existing methods for conditional sampling at inference time seek mainly to enforce the constraints, which is insufficient to match the statistics of the distribution or compute the probability of the chosen events. To achieve these ends, optimally one would use the conditional score function, but its computation is typically intractable. In this work, we develop a probabilistic approximation scheme for the conditional score function which provably converges to the true distribution as the noise level decreases. With this scheme we are able to sample conditionally on nonlinear userdefined events at inference time, and matches data statistics even when sampling from the tails of the distribution. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: ICML 2023 Conference

arXiv:2306.01174 [pdf, other]

Neural Ideal Large Eddy Simulation: Modeling Turbulence with Neural Stochastic Differential Equations

Authors: Anudhyan Boral, Zhong Yi Wan, Leonardo Zepeda-Núñez, James Lottes, Qing Wang, Yi-fan Chen, John Roberts Anderson, Fei Sha

Abstract: We introduce a data-driven learning framework that assimilates two powerful ideas: ideal large eddy simulation (LES) from turbulence closure modeling and neural stochastic differential equations (SDE) for stochastic modeling. The ideal LES models the LES flow by treating each full-order trajectory as a random realization of the underlying dynamics, as such, the effect of small-scales is marginaliz… ▽ More We introduce a data-driven learning framework that assimilates two powerful ideas: ideal large eddy simulation (LES) from turbulence closure modeling and neural stochastic differential equations (SDE) for stochastic modeling. The ideal LES models the LES flow by treating each full-order trajectory as a random realization of the underlying dynamics, as such, the effect of small-scales is marginalized to obtain the deterministic evolution of the LES state. However, ideal LES is analytically intractable. In our work, we use a latent neural SDE to model the evolution of the stochastic process and an encoder-decoder pair for transforming between the latent space and the desired ideal flow field. This stands in sharp contrast to other types of neural parameterization of closure models where each trajectory is treated as a deterministic realization of the dynamics. We show the effectiveness of our approach (niLES - neural ideal LES) on a challenging chaotic dynamical system: Kolmogorov flow at a Reynolds number of 20,000. Compared to competing methods, our method can handle non-uniform geometries using unstructured meshes seamlessly. In particular, niLES leads to trajectories with more accurate statistics and enhances stability, particularly for long-horizon rollouts. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: 18 pages

arXiv:2305.15618 [pdf, other]

Debias Coarsely, Sample Conditionally: Statistical Downscaling through Optimal Transport and Probabilistic Diffusion Models

Authors: Zhong Yi Wan, Ricardo Baptista, Yi-fan Chen, John Anderson, Anudhyan Boral, Fei Sha, Leonardo Zepeda-Núñez

Abstract: We introduce a two-stage probabilistic framework for statistical downscaling using unpaired data. Statistical downscaling seeks a probabilistic map to transform low-resolution data from a biased coarse-grained numerical scheme to high-resolution data that is consistent with a high-fidelity scheme. Our framework tackles the problem by composing two transformations: (i) a debiasing step via an optim… ▽ More We introduce a two-stage probabilistic framework for statistical downscaling using unpaired data. Statistical downscaling seeks a probabilistic map to transform low-resolution data from a biased coarse-grained numerical scheme to high-resolution data that is consistent with a high-fidelity scheme. Our framework tackles the problem by composing two transformations: (i) a debiasing step via an optimal transport map, and (ii) an upsampling step achieved by a probabilistic diffusion model with a posteriori conditional sampling. This approach characterizes a conditional distribution without needing paired data, and faithfully recovers relevant physical statistics from biased samples. We demonstrate the utility of the proposed approach on one- and two-dimensional fluid flow problems, which are representative of the core difficulties present in numerical simulations of weather and climate. Our method produces realistic high-resolution outputs from low-resolution inputs, by upsampling resolutions of 8x and 16x. Moreover, our procedure correctly matches the statistics of physical quantities, even when the low-frequency content of the inputs and outputs do not match, a crucial but difficult-to-satisfy assumption needed by current state-of-the-art alternatives. Code for this work is available at: https://github.com/google-research/swirl-dynamics/tree/main/swirl_dynamics/projects/probabilistic_diffusion. △ Less

Submitted 30 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023 (spotlight)

arXiv:2301.10391 [pdf, other]

Evolve Smoothly, Fit Consistently: Learning Smooth Latent Dynamics For Advection-Dominated Systems

Authors: Zhong Yi Wan, Leonardo Zepeda-Núñez, Anudhyan Boral, Fei Sha

Abstract: We present a data-driven, space-time continuous framework to learn surrogate models for complex physical systems described by advection-dominated partial differential equations. Those systems have slow-decaying Kolmogorov n-width that hinders standard methods, including reduced order modeling, from producing high-fidelity simulations at low cost. In this work, we construct hypernetwork-based laten… ▽ More We present a data-driven, space-time continuous framework to learn surrogate models for complex physical systems described by advection-dominated partial differential equations. Those systems have slow-decaying Kolmogorov n-width that hinders standard methods, including reduced order modeling, from producing high-fidelity simulations at low cost. In this work, we construct hypernetwork-based latent dynamical models directly on the parameter space of a compact representation network. We leverage the expressive power of the network and a specially designed consistency-inducing regularization to obtain latent trajectories that are both low-dimensional and smooth. These properties render our surrogate models highly efficient at inference time. We show the efficacy of our framework by learning models that generate accurate multi-step rollout predictions at much faster inference speed compared to competitors, for several challenging examples. △ Less

Submitted 6 February, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

Comments: 25 pages, 9 figures

arXiv:2212.06068 [pdf, other]

doi 10.1016/j.cam.2024.116050

Solving the Wide-band Inverse Scattering Problem via Equivariant Neural Networks

Authors: Borong Zhang, Leonardo Zepeda-Núñez, Qin Li

Abstract: This paper introduces a novel deep neural network architecture for solving the inverse scattering problem in frequency domain with wide-band data, by directly approximating the inverse map, thus avoiding the expensive optimization loop of classical methods. The architecture is motivated by the filtered back-projection formula in the full aperture regime and with homogeneous background, and it leve… ▽ More This paper introduces a novel deep neural network architecture for solving the inverse scattering problem in frequency domain with wide-band data, by directly approximating the inverse map, thus avoiding the expensive optimization loop of classical methods. The architecture is motivated by the filtered back-projection formula in the full aperture regime and with homogeneous background, and it leverages the underlying equivariance of the problem and compressibility of the integral operator. This drastically reduces the number of training parameters, and therefore the computational and sample complexity of the method. In particular, we obtain an architecture whose number of parameters scale sub-linearly with respect to the dimension of the inputs, while its inference complexity scales super-linearly but with very small constants. We provide several numerical tests that show that the current approach results in better reconstruction than optimization-based techniques such as full-waveform inversion, but at a fraction of the cost while being competitive with state-of-the-art machine learning methods. △ Less

Submitted 11 October, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

Comments: 21 pages, 9 figures, and 4 tables

Journal ref: Journal of Computational and Applied Mathematics, Volume 451, 1 December 2024, 116050

arXiv:2207.00556 [pdf, other]

Learning to correct spectral methods for simulating turbulent flows

Authors: Gideon Dresdner, Dmitrii Kochkov, Peter Norgaard, Leonardo Zepeda-Núñez, Jamie A. Smith, Michael P. Brenner, Stephan Hoyer

Abstract: Despite their ubiquity throughout science and engineering, only a handful of partial differential equations (PDEs) have analytical, or closed-form solutions. This motivates a vast amount of classical work on numerical simulation of PDEs and more recently, a whirlwind of research into data-driven techniques leveraging machine learning (ML). A recent line of work indicates that a hybrid of classical… ▽ More Despite their ubiquity throughout science and engineering, only a handful of partial differential equations (PDEs) have analytical, or closed-form solutions. This motivates a vast amount of classical work on numerical simulation of PDEs and more recently, a whirlwind of research into data-driven techniques leveraging machine learning (ML). A recent line of work indicates that a hybrid of classical numerical techniques and machine learning can offer significant improvements over either approach alone. In this work, we show that the choice of the numerical scheme is crucial when incorporating physics-based priors. We build upon Fourier-based spectral methods, which are known to be more efficient than other numerical schemes for simulating PDEs with smooth and periodic solutions. Specifically, we develop ML-augmented spectral solvers for three common PDEs of fluid dynamics. Our models are more accurate (2-4x) than standard spectral solvers at the same resolution but have longer overall runtimes (~2x), due to the additional runtime cost of the neural network component. We also demonstrate a handful of key design principles for combining machine learning and numerical methods for solving PDEs. △ Less

Submitted 25 June, 2023; v1 submitted 1 July, 2022; originally announced July 2022.

arXiv:2106.01143 [pdf, other]

Accurate and Robust Deep Learning Framework for Solving Wave-Based Inverse Problems in the Super-Resolution Regime

Authors: Matthew Li, Laurent Demanet, Leonardo Zepeda-Núñez

Abstract: We propose an end-to-end deep learning framework that comprehensively solves the inverse wave scattering problem across all length scales. Our framework consists of the newly introduced wide-band butterfly network coupled with a simple training procedure that dynamically injects noise during training. While our trained network provides competitive results in classical imaging regimes, most notably… ▽ More We propose an end-to-end deep learning framework that comprehensively solves the inverse wave scattering problem across all length scales. Our framework consists of the newly introduced wide-band butterfly network coupled with a simple training procedure that dynamically injects noise during training. While our trained network provides competitive results in classical imaging regimes, most notably it also succeeds in the super-resolution regime where other comparable methods fail. This encompasses both (i) reconstruction of scatterers with sub-wavelength geometric features, and (ii) accurate imaging when two or more scatterers are separated by less than the classical diffraction limit. We demonstrate these properties are retained even in the presence of strong noise and extend to scatterers not previously seen in the training set. In addition, our network is straightforward to train requiring no restarts and has an online runtime that is an order of magnitude faster than optimization-based algorithms. We perform experiments with a variety of wave scattering mediums and we demonstrate that our proposed framework outperforms both classical inversion and competing network architectures that specialize in oscillatory wave scattering data. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Comments: 22 pages

arXiv:2011.12413 [pdf, other]

Wide-band butterfly network: stable and efficient inversion via multi-frequency neural networks

Authors: Matthew Li, Laurent Demanet, Leonardo Zepeda-Núñez

Abstract: We introduce an end-to-end deep learning architecture called the wide-band butterfly network (WideBNet) for approximating the inverse scattering map from wide-band scattering data. This architecture incorporates tools from computational harmonic analysis, such as the butterfly factorization, and traditional multi-scale methods, such as the Cooley-Tukey FFT algorithm, to drastically reduce the numb… ▽ More We introduce an end-to-end deep learning architecture called the wide-band butterfly network (WideBNet) for approximating the inverse scattering map from wide-band scattering data. This architecture incorporates tools from computational harmonic analysis, such as the butterfly factorization, and traditional multi-scale methods, such as the Cooley-Tukey FFT algorithm, to drastically reduce the number of trainable parameters to match the inherent complexity of the problem. As a result WideBNet is efficient: it requires fewer training points than off-the-shelf architectures, and has stable training dynamics, thus it can rely on standard weight initialization strategies. The architecture automatically adapts to the dimensions of the data with only a few hyper-parameters that the user must specify. WideBNet is able to produce images that are competitive with optimization-based approaches, but at a fraction of the cost, and we also demonstrate numerically that it learns to super-resolve scatterers in the full aperture scattering setup. △ Less

Submitted 28 October, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

arXiv:2010.05295 [pdf, other]

Efficient Long-Range Convolutions for Point Clouds

Authors: Yifan Peng, Lin Lin, Lexing Ying, Leonardo Zepeda-Núñez

Abstract: The efficient treatment of long-range interactions for point clouds is a challenging problem in many scientific machine learning applications. To extract global information, one usually needs a large window size, a large number of layers, and/or a large number of channels. This can often significantly increase the computational cost. In this work, we present a novel neural network layer that direc… ▽ More The efficient treatment of long-range interactions for point clouds is a challenging problem in many scientific machine learning applications. To extract global information, one usually needs a large window size, a large number of layers, and/or a large number of channels. This can often significantly increase the computational cost. In this work, we present a novel neural network layer that directly incorporates long-range information for a point cloud. This layer, dubbed the long-range convolutional (LRC)-layer, leverages the convolutional theorem coupled with the non-uniform Fourier transform. In a nutshell, the LRC-layer mollifies the point cloud to an adequately sized regular grid, computes its Fourier transform, multiplies the result by a set of trainable Fourier multipliers, computes the inverse Fourier transform, and finally interpolates the result back to the point cloud. The resulting global all-to-all convolution operation can be performed in nearly-linear time asymptotically with respect to the number of input points. The LRC-layer is a particularly powerful tool when combined with local convolution as together they offer efficient and seamless treatment of both short and long range interactions. We showcase this framework by introducing a neural network architecture that combines LRC-layers with short-range convolutional layers to accurately learn the energy and force associated with a $N$-body potential. We also exploit the induced two-level decomposition and propose an efficient strategy to train the combined architecture with a reduced number of samples. △ Less

Submitted 11 October, 2020; originally announced October 2020.

arXiv:2002.10561 [pdf, other]

Learning the mapping $\mathbf{x}\mapsto \sum_{i=1}^d x_i^2$: the cost of finding the needle in a haystack

Authors: Jiefu Zhang, Leonardo Zepeda-Núñez, Yuan Yao, Lin Lin

Abstract: The task of using machine learning to approximate the mapping $\mathbf{x}\mapsto\sum_{i=1}^d x_i^2$ with $x_i\in[-1,1]$ seems to be a trivial one. Given the knowledge of the separable structure of the function, one can design a sparse network to represent the function very accurately, or even exactly. When such structural information is not available, and we may only use a dense neural network, th… ▽ More The task of using machine learning to approximate the mapping $\mathbf{x}\mapsto\sum_{i=1}^d x_i^2$ with $x_i\in[-1,1]$ seems to be a trivial one. Given the knowledge of the separable structure of the function, one can design a sparse network to represent the function very accurately, or even exactly. When such structural information is not available, and we may only use a dense neural network, the optimization procedure to find the sparse network embedded in the dense network is similar to finding the needle in a haystack, using a given number of samples of the function. We demonstrate that the cost (measured by sample complexity) of finding the needle is directly related to the Barron norm of the function. While only a small number of samples is needed to train a sparse network, the dense network trained with the same number of samples exhibits large test loss and a large generalization gap. In order to control the size of the generalization gap, we find that the use of explicit regularization becomes increasingly more important as $d$ increases. The numerically observed sample complexity with explicit regularization scales as $\mathcal{O}(d^{2.5})$, which is in fact better than the theoretically predicted sample complexity that scales as $\mathcal{O}(d^{4})$. Without explicit regularization (also called implicit regularization), the numerically observed sample complexity is significantly higher and is close to $\mathcal{O}(d^{4.5})$. △ Less

Submitted 16 May, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

arXiv:1912.00775 [pdf, other]

Deep Density: circumventing the Kohn-Sham equations via symmetry preserving neural networks

Authors: Leonardo Zepeda-Núñez, Yixiao Chen, Jiefu Zhang, Weile Jia, Linfeng Zhang, Lin Lin

Abstract: The recently developed Deep Potential [Phys. Rev. Lett. 120, 143001, 2018] is a powerful method to represent general inter-atomic potentials using deep neural networks. The success of Deep Potential rests on the proper treatment of locality and symmetry properties of each component of the network. In this paper, we leverage its network structure to effectively represent the mapping from the atomic… ▽ More The recently developed Deep Potential [Phys. Rev. Lett. 120, 143001, 2018] is a powerful method to represent general inter-atomic potentials using deep neural networks. The success of Deep Potential rests on the proper treatment of locality and symmetry properties of each component of the network. In this paper, we leverage its network structure to effectively represent the mapping from the atomic configuration to the electron density in Kohn-Sham density function theory (KS-DFT). By directly targeting at the self-consistent electron density, we demonstrate that the adapted network architecture, called the Deep Density, can effectively represent the electron density as the linear combination of contributions from many local clusters. The network is constructed to satisfy the translation, rotation, and permutation symmetries, and is designed to be transferable to different system sizes. We demonstrate that using a relatively small number of training snapshots, Deep Density achieves excellent performance for one-dimensional insulating and metallic systems, as well as systems with mixed insulating and metallic characters. We also demonstrate its performance for real three-dimensional systems, including small organic molecules, as well as extended systems such as water (up to $512$ molecules) and aluminum (up to $256$ atoms). △ Less

Submitted 27 November, 2019; originally announced December 2019.

Showing 1–18 of 18 results for author: Zepeda-Núñez, L