Search | arXiv e-print repository

Reflections from the 2024 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry

Authors: Yoel Zimmermann, Adib Bazgir, Zartashia Afzal, Fariha Agbere, Qianxiang Ai, Nawaf Alampara, Alexander Al-Feghali, Mehrad Ansari, Dmytro Antypov, Amro Aswad, Jiaru Bai, Viktoriia Baibakova, Devi Dutta Biswajeet, Erik Bitzek, Joshua D. Bocarsly, Anna Borisova, Andres M Bran, L. Catherine Brinson, Marcel Moran Calderon, Alessandro Canalicchio, Victor Chen, Yuan Chiang, Defne Circi, Benjamin Charmes, Vikrant Chaudhary , et al. (119 additional authors not shown)

Abstract: Here, we present the outcomes from the second Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry, which engaged participants across global hybrid locations, resulting in 34 team submissions. The submissions spanned seven key application areas and demonstrated the diverse utility of LLMs for applications in (1) molecular and material property prediction; (2) mo… ▽ More Here, we present the outcomes from the second Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry, which engaged participants across global hybrid locations, resulting in 34 team submissions. The submissions spanned seven key application areas and demonstrated the diverse utility of LLMs for applications in (1) molecular and material property prediction; (2) molecular and material design; (3) automation and novel interfaces; (4) scientific communication and education; (5) research data management and automation; (6) hypothesis generation and evaluation; and (7) knowledge extraction and reasoning from scientific literature. Each team submission is presented in a summary table with links to the code and as brief papers in the appendix. Beyond team results, we discuss the hackathon event and its hybrid format, which included physical hubs in Toronto, Montreal, San Francisco, Berlin, Lausanne, and Tokyo, alongside a global online hub to enable local and virtual collaboration. Overall, the event highlighted significant improvements in LLM capabilities since the previous year's hackathon, suggesting continued expansion of LLMs for applications in materials science and chemistry research. These outcomes demonstrate the dual utility of LLMs as both multipurpose models for diverse machine learning tasks and platforms for rapid prototyping custom applications in scientific research. △ Less

Submitted 2 January, 2025; v1 submitted 20 November, 2024; originally announced November 2024.

Comments: Updating author information, the submission remains largely unchanged. 98 pages total

arXiv:2404.09950 [pdf, other]

Parametric Sensitivities of a Wind-driven Baroclinic Ocean Using Neural Surrogates

Authors: Yixuan Sun, Elizabeth Cucuzzella, Steven Brus, Sri Hari Krishna Narayanan, Balasubramanya Nadiga, Luke Van Roekel, Jan Hückelheim, Sandeep Madireddy, Patrick Heimbach

Abstract: Numerical models of the ocean and ice sheets are crucial for understanding and simulating the impact of greenhouse gases on the global climate. Oceanic processes affect phenomena such as hurricanes, extreme precipitation, and droughts. Ocean models rely on subgrid-scale parameterizations that require calibration and often significantly affect model skill. When model sensitivities to parameters can… ▽ More Numerical models of the ocean and ice sheets are crucial for understanding and simulating the impact of greenhouse gases on the global climate. Oceanic processes affect phenomena such as hurricanes, extreme precipitation, and droughts. Ocean models rely on subgrid-scale parameterizations that require calibration and often significantly affect model skill. When model sensitivities to parameters can be computed by using approaches such as automatic differentiation, they can be used for such calibration toward reducing the misfit between model output and data. Because the SOMA model code is challenging to differentiate, we have created neural network-based surrogates for estimating the sensitivity of the ocean model to model parameters. We first generated perturbed parameter ensemble data for an idealized ocean model and trained three surrogate neural network models. The neural surrogates accurately predicted the one-step forward ocean dynamics, of which we then computed the parametric sensitivity. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2312.03876 [pdf, other]

Scaling transformer neural networks for skillful and reliable medium-range weather forecasting

Authors: Tung Nguyen, Rohan Shah, Hritik Bansal, Troy Arcomano, Romit Maulik, Veerabhadra Kotamarthi, Ian Foster, Sandeep Madireddy, Aditya Grover

Abstract: Weather forecasting is a fundamental problem for anticipating and mitigating the impacts of climate change. Recently, data-driven approaches for weather forecasting based on deep learning have shown great promise, achieving accuracies that are competitive with operational systems. However, those methods often employ complex, customized architectures without sufficient ablation analysis, making it… ▽ More Weather forecasting is a fundamental problem for anticipating and mitigating the impacts of climate change. Recently, data-driven approaches for weather forecasting based on deep learning have shown great promise, achieving accuracies that are competitive with operational systems. However, those methods often employ complex, customized architectures without sufficient ablation analysis, making it difficult to understand what truly contributes to their success. Here we introduce Stormer, a simple transformer model that achieves state-of-the-art performance on weather forecasting with minimal changes to the standard transformer backbone. We identify the key components of Stormer through careful empirical analyses, including weather-specific embedding, randomized dynamics forecast, and pressure-weighted loss. At the core of Stormer is a randomized forecasting objective that trains the model to forecast the weather dynamics over varying time intervals. During inference, this allows us to produce multiple forecasts for a target lead time and combine them to obtain better forecast accuracy. On WeatherBench 2, Stormer performs competitively at short to medium-range forecasts and outperforms current methods beyond 7 days, while requiring orders-of-magnitude less training data and compute. Additionally, we demonstrate Stormer's favorable scaling properties, showing consistent improvements in forecast accuracy with increases in model size and training tokens. Code and checkpoints are available at https://github.com/tung-nd/stormer. △ Less

Submitted 22 October, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: Neural Information Processing Systems (NeurIPS 2024)

arXiv:2311.08421 [pdf, other]

Surrogate Neural Networks to Estimate Parametric Sensitivity of Ocean Models

Authors: Yixuan Sun, Elizabeth Cucuzzella, Steven Brus, Sri Hari Krishna Narayanan, Balu Nadiga, Luke Van Roekel, Jan Hückelheim, Sandeep Madireddy

Abstract: Modeling is crucial to understanding the effect of greenhouse gases, warming, and ice sheet melting on the ocean. At the same time, ocean processes affect phenomena such as hurricanes and droughts. Parameters in the models that cannot be physically measured have a significant effect on the model output. For an idealized ocean model, we generated perturbed parameter ensemble data and trained surrog… ▽ More Modeling is crucial to understanding the effect of greenhouse gases, warming, and ice sheet melting on the ocean. At the same time, ocean processes affect phenomena such as hurricanes and droughts. Parameters in the models that cannot be physically measured have a significant effect on the model output. For an idealized ocean model, we generated perturbed parameter ensemble data and trained surrogate neural network models. The neural surrogates accurately predicted the one-step forward dynamics, of which we then computed the parametric sensitivity. △ Less

Submitted 10 November, 2023; originally announced November 2023.

arXiv:2202.11557 [pdf, other]

doi 10.1088/1361-6587/ac89ab

Single Gaussian Process Method for Arbitrary Tokamak Regimes with a Statistical Analysis

Authors: Jarrod Leddy, Sandeep Madireddy, Eric Howell, Scott Kruger

Abstract: Gaussian Process Regression (GPR) is a Bayesian method for inferring profiles based on input data. The technique is increasing in popularity in the fusion community due to its many advantages over traditional fitting techniques including intrinsic uncertainty quantification and robustness to over-fitting. This work investigates the use of a new method, the change-point method, for handling the var… ▽ More Gaussian Process Regression (GPR) is a Bayesian method for inferring profiles based on input data. The technique is increasing in popularity in the fusion community due to its many advantages over traditional fitting techniques including intrinsic uncertainty quantification and robustness to over-fitting. This work investigates the use of a new method, the change-point method, for handling the varying length scales found in different tokamak regimes. The use of the Student's t-distribution for the Bayesian likelihood probability is also investigated and shown to be advantageous in providing good fits in profiles with many outliers. To compare different methods, synthetic data generated from analytic profiles is used to create a database enabling a quantitative statistical comparison of which methods perform the best. Using a full Bayesian approach with the change-point method, Matérn kernel for the prior probability, and Student's t-distribution for the likelihood is shown to give the best results. △ Less

Submitted 23 February, 2022; originally announced February 2022.

Comments: submitted to PPCF

arXiv:2110.13041 [pdf, other]

doi 10.3389/fdata.2022.787421

Applications and Techniques for Fast Machine Learning in Science

Authors: Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini, Thea Aarrestad, Steffen Bahr, Jurgen Becker, Anne-Sophie Berthold, Richard J. Bonventre, Tomas E. Muller Bravo, Markus Diefenthaler, Zhen Dong, Nick Fritzsche, Amir Gholami, Ekaterina Govorkova, Kyle J Hazelwood , et al. (62 additional authors not shown)

Abstract: In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML ac… ▽ More In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs. △ Less

Submitted 25 October, 2021; originally announced October 2021.

Comments: 66 pages, 13 figures, 5 tables

Report number: FERMILAB-PUB-21-502-AD-E-SCD

Journal ref: Front. Big Data 5, 787421 (2022)

arXiv:1909.09144 [pdf, other]

Using recurrent neural networks for nonlinear component computation in advection-dominated reduced-order models

Authors: Romit Maulik, Vishwas Rao, Sandeep Madireddy, Bethany Lusch, Prasanna Balaprakash

Abstract: Rapid simulations of advection-dominated problems are vital for multiple engineering and geophysical applications. In this paper, we present a long short-term memory neural network to approximate the nonlinear component of the reduced-order model (ROM) of an advection-dominated partial differential equation. This is motivated by the fact that the nonlinear term is the most expensive component of a… ▽ More Rapid simulations of advection-dominated problems are vital for multiple engineering and geophysical applications. In this paper, we present a long short-term memory neural network to approximate the nonlinear component of the reduced-order model (ROM) of an advection-dominated partial differential equation. This is motivated by the fact that the nonlinear term is the most expensive component of a successful ROM. For our approach, we utilize a Galerkin projection to isolate the linear and the transient components of the dynamical system and then use discrete empirical interpolation to generate training data for supervised learning. We note that the numerical time-advancement and linear-term computation of the system ensure a greater preservation of physics than does a process that is fully modeled. Our results show that the proposed framework recovers transient dynamics accurately without nonlinear term computations in full-order space and represents a cost-effective alternative to solely equation-based ROMs. △ Less

Submitted 1 November, 2019; v1 submitted 18 September, 2019; originally announced September 2019.

arXiv:1906.07815 [pdf, other]

doi 10.1016/j.physd.2020.132368

Time-series learning of latent-space dynamics for reduced-order model closure

Authors: Romit Maulik, Arvind Mohan, Bethany Lusch, Sandeep Madireddy, Prasanna Balaprakash, Daniel Livescu

Abstract: We study the performance of long short-term memory networks (LSTMs) and neural ordinary differential equations (NODEs) in learning latent-space representations of dynamical equations for an advection-dominated problem given by the viscous Burgers equation. Our formulation is devised in a non-intrusive manner with an equation-free evolution of dynamics in a reduced space with the latter being obtai… ▽ More We study the performance of long short-term memory networks (LSTMs) and neural ordinary differential equations (NODEs) in learning latent-space representations of dynamical equations for an advection-dominated problem given by the viscous Burgers equation. Our formulation is devised in a non-intrusive manner with an equation-free evolution of dynamics in a reduced space with the latter being obtained through a proper orthogonal decomposition. In addition, we leverage the sequential nature of learning for both LSTMs and NODEs to demonstrate their capability for closure in systems which are not completely resolved in the reduced space. We assess our hypothesis for two advection-dominated problems given by the viscous Burgers equation. It is observed that both LSTMs and NODEs are able to reproduce the effects of the absent scales for our test cases more effectively than intrusive dynamics evolution through a Galerkin projection. This result empirically suggests that time-series learning techniques implicitly leverage a memory kernel for coarse-grained system closure as is suggested through the Mori-Zwanzig formalism. △ Less

Submitted 10 September, 2019; v1 submitted 15 June, 2019; originally announced June 2019.

arXiv:1904.05433 [pdf, other]

Phase Segmentation in Atom-Probe Tomography Using Deep Learning-Based Edge Detection

Authors: Sandeep Madireddy, Ding-Wen Chung, Troy Loeffler, Subramanian K. R. S. Sankaranarayanan, David N. Seidman, Prasanna Balaprakash, Olle Heinonen

Abstract: Atom-probe tomography (APT) facilitates nano- and atomic-scale characterization and analysis of microstructural features. Specifically, APT is well suited to study the interfacial properties of granular or heterophase systems. Traditionally, the identification of the interface between, for precipitate and matrix phases, in APT data has been obtained either by extracting iso-concentration surfaces… ▽ More Atom-probe tomography (APT) facilitates nano- and atomic-scale characterization and analysis of microstructural features. Specifically, APT is well suited to study the interfacial properties of granular or heterophase systems. Traditionally, the identification of the interface between, for precipitate and matrix phases, in APT data has been obtained either by extracting iso-concentration surfaces based on a user-supplied concentration value or by manually perturbing the concentration value until the iso-concentration surface qualitatively matches the interface. These approaches are subjective, not scalable, and may lead to inconsistencies due to local composition inhomogeneities. We propose a digital image segmentation approach based on deep neural networks that transfer learned knowledge from natural images to automatically segment the data obtained from APT into different phases. This approach not only provides an efficient way to segment the data and extract interfacial properties but does so without the need for expensive interface labeling for training the segmentation model. We consider here a system with a precipitate phase in a matrix and with three different interface modalities---layered, isolated, and interconnected---that are obtained for different relative geometries of the precipitate phase. We demonstrate the accuracy of our segmentation approach through qualitative visualization of the interfaces, as well as through quantitative comparisons with proximity histograms obtained by using more traditional approaches. △ Less

Submitted 10 April, 2019; originally announced April 2019.

Comments: 23 pages, 6 figures

Showing 1–9 of 9 results for author: Madireddy, S