Search | arXiv e-print repository

Leveraging AI for Productive and Trustworthy HPC Software: Challenges and Research Directions

Authors: Keita Teranishi, Harshitha Menon, William F. Godoy, Prasanna Balaprakash, David Bau, Tal Ben-Nun, Abhinav Bhatele, Franz Franchetti, Michael Franusich, Todd Gamblin, Giorgis Georgakoudis, Tom Goldstein, Arjun Guha, Steven Hahn, Costin Iancu, Zheming Jin, Terry Jones, Tze Meng Low, Het Mankad, Narasinga Rao Miniskar, Mohammad Alaul Haque Monil, Daniel Nichols, Konstantinos Parasyris, Swaroop Pophale, Pedro Valero-Lara , et al. (3 additional authors not shown)

Abstract: We discuss the challenges and propose research directions for using AI to revolutionize the development of high-performance computing (HPC) software. AI technologies, in particular large language models, have transformed every aspect of software development. For its part, HPC software is recognized as a highly specialized scientific field of its own. We discuss the challenges associated with lever… ▽ More We discuss the challenges and propose research directions for using AI to revolutionize the development of high-performance computing (HPC) software. AI technologies, in particular large language models, have transformed every aspect of software development. For its part, HPC software is recognized as a highly specialized scientific field of its own. We discuss the challenges associated with leveraging state-of-the-art AI technologies to develop such a unique and niche class of software and outline our research directions in the two US Department of Energy--funded projects for advancing HPC Software via AI: Ellora and Durban. △ Less

Submitted 12 May, 2025; originally announced May 2025.

Comments: 12 pages, 1 Figure, Accepted at "The 1st International Workshop on Foundational Large Language Models Advances for HPC" LLM4HPC to be held in conjunction with ISC High Performance 2025

arXiv:2505.05623 [pdf, other]

Characterizing GPU Energy Usage in Exascale-Ready Portable Science Applications

Authors: William F. Godoy, Oscar Hernandez, Paul R. C. Kent, Maria Patrou, Kazi Asifuzzaman, Narasinga Rao Miniskar, Pedro Valero-Lara, Jeffrey S. Vetter, Matthew D. Sinclair, Jason Lowe-Power, Bobby R. Bruce

Abstract: We characterize the GPU energy usage of two widely adopted exascale-ready applications representing two classes of particle and mesh solvers: (i) QMCPACK, a quantum Monte Carlo package, and (ii) AMReXCastro, an adaptive mesh astrophysical code. We analyze power, temperature, utilization, and energy traces from double-/single (mixed)-precision benchmarks on NVIDIA's A100 and H100 and AMD's MI250X G… ▽ More We characterize the GPU energy usage of two widely adopted exascale-ready applications representing two classes of particle and mesh solvers: (i) QMCPACK, a quantum Monte Carlo package, and (ii) AMReXCastro, an adaptive mesh astrophysical code. We analyze power, temperature, utilization, and energy traces from double-/single (mixed)-precision benchmarks on NVIDIA's A100 and H100 and AMD's MI250X GPUs using queries in NVML and rocm_smi_lib, respectively. We explore application-specific metrics to provide insights on energy vs. performance trade-offs. Our results suggest that mixed-precision energy savings range between 6-25% on QMCPACK and 45% on AMReX-Castro. Also, we found gaps in the AMD tooling used on Frontier GPUs that need to be understood, while query resolutions on NVML have little variability between 1 ms-1 s. Overall, application level knowledge is crucial to define energy-cost/science-benefit opportunities for the codesign of future supercomputer architectures in the post-Moore era. △ Less

Submitted 16 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

Comments: 13 pages, 8 figures, 3 tables. Accepted at the Energy Efficiency with Sustainable Performance: Techniques, Tools, and Best Practices, EESP Workshop, in conjunction with ISC High Performance 2025

arXiv:2409.07299 [pdf, other]

From Memory Traces to Surface Chemistry: Decoding REDOX Reactions

Authors: Ana Luiza Costa Silva, Rafael Schio Wengenroth Silva, Lucas Augusto Moisés, Adenilson José Chiquito, Marcio Peron Franco de Godoy, Fabian Hartmann, Victor Lopez-Richard

Abstract: Gas and moisture sensing devices leveraging the resistive switching effect in transition metal oxide memristors promise to revolutionize next-generation, nano-scaled, cost-effective, and environmentally sustainable sensor solutions. These sensors encode readouts in resistance state changes based on gas concentration, yet their nonlinear current-voltage characteristics offer richer dynamics, captur… ▽ More Gas and moisture sensing devices leveraging the resistive switching effect in transition metal oxide memristors promise to revolutionize next-generation, nano-scaled, cost-effective, and environmentally sustainable sensor solutions. These sensors encode readouts in resistance state changes based on gas concentration, yet their nonlinear current-voltage characteristics offer richer dynamics, capturing detailed information about REDOX reactions and surface kinetics. Traditional vertical devices fail to fully exploit this complexity. This study demonstrates planar resistive switching devices, moving beyond the Butler-Volmer model. A systematic investigation of the electrochemical processes in Na-doped ZnO with lateral planar contacts reveals intricate patterns resulting from REDOX reactions on the device surface. When combined with advanced algorithms for pattern recognition, allow the analysis of complex switching patterns, including crossings, loop directions, and resistance values, providing unprecedented insights for next-generation complex sensors. △ Less

Submitted 28 October, 2024; v1 submitted 11 September, 2024; originally announced September 2024.

arXiv:2312.02200 [pdf, other]

An Empirical Study of Automated Mislabel Detection in Real World Vision Datasets

Authors: Maya Srikanth, Jeremy Irvin, Brian Wesley Hill, Felipe Godoy, Ishan Sabane, Andrew Y. Ng

Abstract: Major advancements in computer vision can primarily be attributed to the use of labeled datasets. However, acquiring labels for datasets often results in errors which can harm model performance. Recent works have proposed methods to automatically identify mislabeled images, but developing strategies to effectively implement them in real world datasets has been sparsely explored. Towards improved d… ▽ More Major advancements in computer vision can primarily be attributed to the use of labeled datasets. However, acquiring labels for datasets often results in errors which can harm model performance. Recent works have proposed methods to automatically identify mislabeled images, but developing strategies to effectively implement them in real world datasets has been sparsely explored. Towards improved data-centric methods for cleaning real world vision datasets, we first conduct more than 200 experiments carefully benchmarking recently developed automated mislabel detection methods on multiple datasets under a variety of synthetic and real noise settings with varying noise levels. We compare these methods to a Simple and Efficient Mislabel Detector (SEMD) that we craft, and find that SEMD performs similarly to or outperforms prior mislabel detection approaches. We then apply SEMD to multiple real world computer vision datasets and test how dataset size, mislabel removal strategy, and mislabel removal amount further affect model performance after retraining on the cleaned data. With careful design of the approach, we find that mislabel removal leads per-class performance improvements of up to 8% of a retrained classifier in smaller data regimes. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2311.16061 [pdf]

Evaluation of microscale crystallinity modification induced by laser writing on Mn3O4 thin films

Authors: Camila Ianhez-Pereira, Akhil Kuriakose, Ariano De Giovanni Rodrigues, Ana Luiza Costa Silva, Ottavia Jedrkiewicz, Monica Bollani, Marcio Peron Franco de Godoy

Abstract: Defining microstructures and managing local crystallinity allow the implementation of several functionalities in thin film technology. The use of ultrashort Bessel beams for bulk crystallinity modification has garnered considerable attention as a versatile technique for semiconductor materials, dielectrics, or metal oxide substrates. The aim of this work is the quantitative evaluation of the cryst… ▽ More Defining microstructures and managing local crystallinity allow the implementation of several functionalities in thin film technology. The use of ultrashort Bessel beams for bulk crystallinity modification has garnered considerable attention as a versatile technique for semiconductor materials, dielectrics, or metal oxide substrates. The aim of this work is the quantitative evaluation of the crystalline changes induced by ultrafast laser micromachining on manganese oxide thin films using micro-Raman spectroscopy. Pulsed Bessel beams featured by a 1 micrometer-sized central core are used to define structures with high spatial precision. The dispersion relation of Mn3O4 optical phonons is determined by considering the conjunction between X-ray diffraction characterization and the phonon localization model. The asymmetries in Raman spectra indicate phonon localization and enable a quantitative tool to determine the crystallite size at micrometer resolution. The results indicate that laser-writing is effective in modifying the low-crystallinity films locally, increasing crystallite sizes from ~8 nm up to 12 nm, and thus highlighting an interesting approach to evaluate laser-induced structural modifications on metal oxide thin films. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 27 pages

arXiv:2309.10292 [pdf, other]

doi 10.1145/3624062.3624278

Julia as a unifying end-to-end workflow language on the Frontier exascale system

Authors: William F. Godoy, Pedro Valero-Lara, Caira Anderson, Katrina W. Lee, Ana Gainaru, Rafael Ferreira da Silva, Jeffrey S. Vetter

Abstract: We evaluate Julia as a single language and ecosystem paradigm powered by LLVM to develop workflow components for high-performance computing. We run a Gray-Scott, 2-variable diffusion-reaction application using a memory-bound, 7-point stencil kernel on Frontier, the US Department of Energy's first exascale supercomputer. We evaluate the performance, scaling, and trade-offs of (i) the computational… ▽ More We evaluate Julia as a single language and ecosystem paradigm powered by LLVM to develop workflow components for high-performance computing. We run a Gray-Scott, 2-variable diffusion-reaction application using a memory-bound, 7-point stencil kernel on Frontier, the US Department of Energy's first exascale supercomputer. We evaluate the performance, scaling, and trade-offs of (i) the computational kernel on AMD's MI250x GPUs, (ii) weak scaling up to 4,096 MPI processes/GPUs or 512 nodes, (iii) parallel I/O writes using the ADIOS2 library bindings, and (iv) Jupyter Notebooks for interactive analysis. Results suggest that although Julia generates a reasonable LLVM-IR, a nearly 50% performance difference exists vs. native AMD HIP stencil codes when running on the GPUs. As expected, we observed near-zero overhead when using MPI and parallel I/O bindings for system-wide installed implementations. Consequently, Julia emerges as a compelling high-performance and high-productivity workflow composition language, as measured on the fastest supercomputer in the world. △ Less

Submitted 27 September, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: 11 pages, 8 figures, accepted at the 18th Workshop on Workflows in Support of Large-Scale Science (WORKS23), IEEE/ACM The International Conference for High Performance Computing, Networking, Storage, and Analysis, SC23

arXiv:2309.07103 [pdf, other]

Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation

Authors: Pedro Valero-Lara, Alexis Huante, Mustafa Al Lail, William F. Godoy, Keita Teranishi, Prasanna Balaprakash, Jeffrey S. Vetter

Abstract: We evaluate the use of the open-source Llama-2 model for generating well-known, high-performance computing kernels (e.g., AXPY, GEMV, GEMM) on different parallel programming models and languages (e.g., C++: OpenMP, OpenMP Offload, OpenACC, CUDA, HIP; Fortran: OpenMP, OpenMP Offload, OpenACC; Python: numpy, Numba, pyCUDA, cuPy; and Julia: Threads, CUDA.jl, AMDGPU.jl). We built upon our previous wor… ▽ More We evaluate the use of the open-source Llama-2 model for generating well-known, high-performance computing kernels (e.g., AXPY, GEMV, GEMM) on different parallel programming models and languages (e.g., C++: OpenMP, OpenMP Offload, OpenACC, CUDA, HIP; Fortran: OpenMP, OpenMP Offload, OpenACC; Python: numpy, Numba, pyCUDA, cuPy; and Julia: Threads, CUDA.jl, AMDGPU.jl). We built upon our previous work that is based on the OpenAI Codex, which is a descendant of GPT-3, to generate similar kernels with simple prompts via GitHub Copilot. Our goal is to compare the accuracy of Llama-2 and our original GPT-3 baseline by using a similar metric. Llama-2 has a simplified model that shows competitive or even superior accuracy. We also report on the differences between these foundational large language models as generative AI continues to redefine human-computer interactions. Overall, Copilot generates codes that are more reliable but less optimized, whereas codes generated by Llama-2 are less reliable but more optimized when correct. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: Accepted at LCPC 2023, The 36th International Workshop on Languages and Compilers for Parallel Computing http://www.lcpcworkshop.org/LCPC23/ . 13 pages, 5 figures, 1 table

arXiv:2307.11502 [pdf, other]

doi 10.5281/zenodo.10420938

Software engineering to sustain a high-performance computing scientific application: QMCPACK

Authors: William F. Godoy, Steven E. Hahn, Michael M. Walsh, Philip W. Fackler, Jaron T. Krogel, Peter W. Doak, Paul R. C. Kent, Alfredo A. Correa, Ye Luo, Mark Dewing

Abstract: We provide an overview of the software engineering efforts and their impact in QMCPACK, a production-level ab-initio Quantum Monte Carlo open-source code targeting high-performance computing (HPC) systems. Aspects included are: (i) strategic expansion of continuous integration (CI) targeting CPUs, using GitHub Actions runners, and NVIDIA and AMD GPUs in pre-exascale systems, using self-hosted hard… ▽ More We provide an overview of the software engineering efforts and their impact in QMCPACK, a production-level ab-initio Quantum Monte Carlo open-source code targeting high-performance computing (HPC) systems. Aspects included are: (i) strategic expansion of continuous integration (CI) targeting CPUs, using GitHub Actions runners, and NVIDIA and AMD GPUs in pre-exascale systems, using self-hosted hardware; (ii) incremental reduction of memory leaks using sanitizers, (iii) incorporation of Docker containers for CI and reproducibility, and (iv) refactoring efforts to improve maintainability, testing coverage, and memory lifetime management. We quantify the value of these improvements by providing metrics to illustrate the shift towards a predictive, rather than reactive, sustainable maintenance approach. Our goal, in documenting the impact of these efforts on QMCPACK, is to contribute to the body of knowledge on the importance of research software engineering (RSE) for the sustainability of community HPC codes and scientific discovery at scale. △ Less

Submitted 21 July, 2023; originally announced July 2023.

Comments: Accepted at the first US-RSE Conference, USRSE2023, https://us-rse.org/usrse23/, 8 pages, 3 figures, 4 tables

arXiv:2306.15121 [pdf, other]

doi 10.1145/3605731.3605886

Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel Generation

Authors: William F. Godoy, Pedro Valero-Lara, Keita Teranishi, Prasanna Balaprakash, Jeffrey S. Vetter

Abstract: We evaluate AI-assisted generative capabilities on fundamental numerical kernels in high-performance computing (HPC), including AXPY, GEMV, GEMM, SpMV, Jacobi Stencil, and CG. We test the generated kernel codes for a variety of language-supported programming models, including (1) C++ (e.g., OpenMP [including offload], OpenACC, Kokkos, SyCL, CUDA, and HIP), (2) Fortran (e.g., OpenMP [including offl… ▽ More We evaluate AI-assisted generative capabilities on fundamental numerical kernels in high-performance computing (HPC), including AXPY, GEMV, GEMM, SpMV, Jacobi Stencil, and CG. We test the generated kernel codes for a variety of language-supported programming models, including (1) C++ (e.g., OpenMP [including offload], OpenACC, Kokkos, SyCL, CUDA, and HIP), (2) Fortran (e.g., OpenMP [including offload] and OpenACC), (3) Python (e.g., numba, Numba, cuPy, and pyCUDA), and (4) Julia (e.g., Threads, CUDA.jl, AMDGPU.jl, and KernelAbstractions.jl). We use the GitHub Copilot capabilities powered by OpenAI Codex available in Visual Studio Code as of April 2023 to generate a vast amount of implementations given simple <kernel> + <programming model> + <optional hints> prompt variants. To quantify and compare the results, we propose a proficiency metric around the initial 10 suggestions given for each prompt. Results suggest that the OpenAI Codex outputs for C++ correlate with the adoption and maturity of programming models. For example, OpenMP and CUDA score really high, whereas HIP is still lacking. We found that prompts from either a targeted language such as Fortran or the more general-purpose Python can benefit from adding code keywords, while Julia prompts perform acceptably well for its mature programming models (e.g., Threads and CUDA.jl). We expect for these benchmarks to provide a point of reference for each programming model's community. Overall, understanding the convergence of large language models, AI, and HPC is crucial due to its rapidly evolving nature and how it is redefining human-computer interactions. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: Accepted at the Sixteenth International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), 2023 to be held in conjunction with ICPP 2023: The 52nd International Conference on Parallel Processing. 10 pages, 6 figures, 5 tables

arXiv:2304.08393 [pdf, other]

Search for gravitational-lensing signatures in the full third observing run of the LIGO-Virgo network

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, C. Alléné, A. Allocca, P. A. Altin , et al. (1670 additional authors not shown)

Abstract: Gravitational lensing by massive objects along the line of sight to the source causes distortions of gravitational wave-signals; such distortions may reveal information about fundamental physics, cosmology and astrophysics. In this work, we have extended the search for lensing signatures to all binary black hole events from the third observing run of the LIGO--Virgo network. We search for repeated… ▽ More Gravitational lensing by massive objects along the line of sight to the source causes distortions of gravitational wave-signals; such distortions may reveal information about fundamental physics, cosmology and astrophysics. In this work, we have extended the search for lensing signatures to all binary black hole events from the third observing run of the LIGO--Virgo network. We search for repeated signals from strong lensing by 1) performing targeted searches for subthreshold signals, 2) calculating the degree of overlap amongst the intrinsic parameters and sky location of pairs of signals, 3) comparing the similarities of the spectrograms amongst pairs of signals, and 4) performing dual-signal Bayesian analysis that takes into account selection effects and astrophysical knowledge. We also search for distortions to the gravitational waveform caused by 1) frequency-independent phase shifts in strongly lensed images, and 2) frequency-dependent modulation of the amplitude and phase due to point masses. None of these searches yields significant evidence for lensing. Finally, we use the non-detection of gravitational-wave lensing to constrain the lensing rate based on the latest merger-rate estimates and the fraction of dark matter composed of compact objects. △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: 28 pages, 11 figures

Report number: LIGO-P2200031

arXiv:2303.06195 [pdf, other]

doi 10.1109/IPDPSW59300.2023.00068

Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes

Authors: William F. Godoy, Pedro Valero-Lara, T. Elise Dettling, Christian Trefftz, Ian Jorquera, Thomas Sheehy, Ross G. Miller, Marc Gonzalez-Tallada, Jeffrey S. Vetter, Valentin Churavy

Abstract: We explore the performance and portability of the high-level programming models: the LLVM-based Julia and Python/Numba, and Kokkos on high-performance computing (HPC) nodes: AMD Epyc CPUs and MI250X graphical processing units (GPUs) on Frontier's test bed Crusher system and Ampere's Arm-based CPUs and NVIDIA's A100 GPUs on the Wombat system at the Oak Ridge Leadership Computing Facilities. We comp… ▽ More We explore the performance and portability of the high-level programming models: the LLVM-based Julia and Python/Numba, and Kokkos on high-performance computing (HPC) nodes: AMD Epyc CPUs and MI250X graphical processing units (GPUs) on Frontier's test bed Crusher system and Ampere's Arm-based CPUs and NVIDIA's A100 GPUs on the Wombat system at the Oak Ridge Leadership Computing Facilities. We compare the default performance of a hand-rolled dense matrix multiplication algorithm on CPUs against vendor-compiled C/OpenMP implementations, and on each GPU against CUDA and HIP. Rather than focusing on the kernel optimization per-se, we select this naive approach to resemble exploratory work in science and as a lower-bound for performance to isolate the effect of each programming model. Julia and Kokkos perform comparably with C/OpenMP on CPUs, while Julia implementations are competitive with CUDA and HIP on GPUs. Performance gaps are identified on NVIDIA A100 GPUs for Julia's single precision and Kokkos, and for Python/Numba in all scenarios. We also comment on half-precision support, productivity, performance portability metrics, and platform readiness. We expect to contribute to the understanding and direction for high-level, high-productivity languages in HPC as the first-generation exascale systems are deployed. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: Accepted at the 28th HIPS workshop, held in conjunction with IPDPS 2023. 10 pages, 9 figures

arXiv:2212.01477 [pdf, other]

doi 10.1093/mnras/stad3120

Search for subsolar-mass black hole binaries in the second part of Advanced LIGO's and Advanced Virgo's third observing run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, C. Alléné, A. Allocca, P. A. Altin , et al. (1680 additional authors not shown)

Abstract: We describe a search for gravitational waves from compact binaries with at least one component with mass 0.2 $M_\odot$ -- $1.0 M_\odot$ and mass ratio $q \geq 0.1$ in Advanced LIGO and Advanced Virgo data collected between 1 November 2019, 15:00 UTC and 27 March 2020, 17:00 UTC. No signals were detected. The most significant candidate has a false alarm rate of 0.2 $\mathrm{yr}^{-1}$. We estimate t… ▽ More We describe a search for gravitational waves from compact binaries with at least one component with mass 0.2 $M_\odot$ -- $1.0 M_\odot$ and mass ratio $q \geq 0.1$ in Advanced LIGO and Advanced Virgo data collected between 1 November 2019, 15:00 UTC and 27 March 2020, 17:00 UTC. No signals were detected. The most significant candidate has a false alarm rate of 0.2 $\mathrm{yr}^{-1}$. We estimate the sensitivity of our search over the entirety of Advanced LIGO's and Advanced Virgo's third observing run, and present the most stringent limits to date on the merger rate of binary black holes with at least one subsolar-mass component. We use the upper limits to constrain two fiducial scenarios that could produce subsolar-mass black holes: primordial black holes (PBH) and a model of dissipative dark matter. The PBH model uses recent prescriptions for the merger rate of PBH binaries that include a rate suppression factor to effectively account for PBH early binary disruptions. If the PBHs are monochromatically distributed, we can exclude a dark matter fraction in PBHs $f_\mathrm{PBH} \gtrsim 0.6$ (at 90% confidence) in the probed subsolar-mass range. However, if we allow for broad PBH mass distributions we are unable to rule out $f_\mathrm{PBH} = 1$. For the dissipative model, where the dark matter has chemistry that allows a small fraction to cool and collapse into black holes, we find an upper bound $f_{\mathrm{DBH}} < 10^{-5}$ on the fraction of atomic dark matter collapsed into black holes. △ Less

Submitted 26 January, 2024; v1 submitted 2 December, 2022; originally announced December 2022.

Comments: https://dcc.ligo.org/P2200139

arXiv:2211.07436 [pdf, other]

doi 10.1109/MCSE.2023.3253847

Giving RSEs a Larger Stage through the Better Scientific Software Fellowship

Authors: William F. Godoy, Ritu Arora, Keith Beattie, David E. Bernholdt, Sarah E. Bratt, Daniel S. Katz, Ignacio Laguna, Amiya K. Maji, Addi Malviya Thakur, Rafael M. Mudafort, Nitin Sukhija, Damian Rouson, Cindy Rubio-González, Karan Vahi

Abstract: The Better Scientific Software Fellowship (BSSwF) was launched in 2018 to foster and promote practices, processes, and tools to improve developer productivity and software sustainability of scientific codes. BSSwF's vision is to grow the community with practitioners, leaders, mentors, and consultants to increase the visibility of scientific software production and sustainability. Over the last fiv… ▽ More The Better Scientific Software Fellowship (BSSwF) was launched in 2018 to foster and promote practices, processes, and tools to improve developer productivity and software sustainability of scientific codes. BSSwF's vision is to grow the community with practitioners, leaders, mentors, and consultants to increase the visibility of scientific software production and sustainability. Over the last five years, many fellowship recipients and honorable mentions have identified as research software engineers (RSEs). This paper provides case studies from several of the program's participants to illustrate some of the diverse ways BSSwF has benefited both the RSE and scientific communities. In an environment where the contributions of RSEs are too often undervalued, we believe that programs such as BSSwF can be a valuable means to recognize and encourage community members to step outside of their regular commitments and expand on their work, collaborations and ideas for a larger audience. △ Less

Submitted 14 November, 2022; v1 submitted 14 November, 2022; originally announced November 2022.

Comments: submitted to Computing in Science & Engineering (CiSE), Special Issue on the Future of Research Software Engineers in the US

arXiv:2211.02740 [pdf, other]

Bridging HPC Communities through the Julia Programming Language

Authors: Valentin Churavy, William F Godoy, Carsten Bauer, Hendrik Ranocha, Michael Schlottke-Lakemper, Ludovic Räss, Johannes Blaschke, Mosè Giordano, Erik Schnetter, Samuel Omlin, Jeffrey S. Vetter, Alan Edelman

Abstract: The Julia programming language has evolved into a modern alternative to fill existing gaps in scientific computing and data science applications. Julia leverages a unified and coordinated single-language and ecosystem paradigm and has a proven track record of achieving high performance without sacrificing user productivity. These aspects make Julia a viable alternative to high-performance computin… ▽ More The Julia programming language has evolved into a modern alternative to fill existing gaps in scientific computing and data science applications. Julia leverages a unified and coordinated single-language and ecosystem paradigm and has a proven track record of achieving high performance without sacrificing user productivity. These aspects make Julia a viable alternative to high-performance computing's (HPC's) existing and increasingly costly many-body workflow composition strategy in which traditional HPC languages (e.g., Fortran, C, C++) are used for simulations, and higher-level languages (e.g., Python, R, MATLAB) are used for data analysis and interactive computing. Julia's rapid growth in language capabilities, package ecosystem, and community make it a promising universal language for HPC. This paper presents the views of a multidisciplinary group of researchers from academia, government, and industry that advocate for an HPC software development paradigm that emphasizes developer productivity, workflow portability, and low barriers for entry. We believe that the Julia programming language, its ecosystem, and its community provide modern and powerful capabilities that enable this group's objectives. Crucially, we believe that Julia can provide a feasible and less costly approach to programming scientific applications and workflows that target HPC facilities. In this work, we examine the current practice and role of Julia as a common, end-to-end programming model to address major challenges in scientific reproducibility, data-driven AI/machine learning, co-design and workflows, scalability and performance portability in heterogeneous computing, network communication, data management, and community education. As a result, the diversification of current investments to fulfill the needs of the upcoming decade is crucial as more supercomputing centers prepare for the exascale era. △ Less

Submitted 10 November, 2022; v1 submitted 4 November, 2022; originally announced November 2022.

Comments: 20 pages; improved image quality

arXiv:2209.02863 [pdf]

doi 10.3847/2041-8213/aca1b0

Model-based cross-correlation search for gravitational waves from the low-mass X-ray binary Scorpius X-1 in LIGO O3 data

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, C. Alléné, A. Allocca, P. A. Altin , et al. (1670 additional authors not shown)

Abstract: We present the results of a model-based search for continuous gravitational waves from the low-mass X-ray binary Scorpius X-1 using LIGO detector data from the third observing run of Advanced LIGO, Advanced Virgo and KAGRA. This is a semicoherent search which uses details of the signal model to coherently combine data separated by less than a specified coherence time, which can be adjusted to bala… ▽ More We present the results of a model-based search for continuous gravitational waves from the low-mass X-ray binary Scorpius X-1 using LIGO detector data from the third observing run of Advanced LIGO, Advanced Virgo and KAGRA. This is a semicoherent search which uses details of the signal model to coherently combine data separated by less than a specified coherence time, which can be adjusted to balance sensitivity with computing cost. The search covered a range of gravitational-wave frequencies from 25Hz to 1600Hz, as well as ranges in orbital speed, frequency and phase determined from observational constraints. No significant detection candidates were found, and upper limits were set as a function of frequency. The most stringent limits, between 100Hz and 200Hz, correspond to an amplitude h0 of about 1e-25 when marginalized isotropically over the unknown inclination angle of the neutron star's rotation axis, or less than 4e-26 assuming the optimal orientation. The sensitivity of this search is now probing amplitudes predicted by models of torque balance equilibrium. For the usual conservative model assuming accretion at the surface of the neutron star, our isotropically-marginalized upper limits are close to the predicted amplitude from about 70Hz to 100Hz; the limits assuming the neutron star spin is aligned with the most likely orbital angular momentum are below the conservative torque balance predictions from 40Hz to 200Hz. Assuming a broader range of accretion models, our direct limits on gravitational-wave amplitude delve into the relevant parameter space over a wide range of frequencies, to 500Hz or more. △ Less

Submitted 2 January, 2023; v1 submitted 6 September, 2022; originally announced September 2022.

Comments: 19 pages, Open Access Journal PDF

Report number: LIGO-P2100110-v13

Journal ref: The Astrophysical Journal Letters, 941, L30 (2022)

arXiv:2209.02610 [pdf, ps, other]

A perspective to navigate the National Laboratory environment for RSE career growth

Authors: William F Godoy

Abstract: This paper shares a perspective for the research software engineering (RSE) community to navigate the National Laboratory landscape. The RSE role is a recent concept that led to organizational challenges to place and evaluate their impact, costs and benefits. The premise is that RSEs are a natural fit into the current landscape and can use traditional career growth strategies in science: publicati… ▽ More This paper shares a perspective for the research software engineering (RSE) community to navigate the National Laboratory landscape. The RSE role is a recent concept that led to organizational challenges to place and evaluate their impact, costs and benefits. The premise is that RSEs are a natural fit into the current landscape and can use traditional career growth strategies in science: publications, community engagements and proposals. Projects funding RSEs can benefit from this synergy and be inclusive on traditional activities. Still, a great deal of introspection is needed to close gaps between the rapidly evolving RSE landscape and the well-established communication patterns in science. This perspective is built upon interactions in industry, academia and government in high-performance computing (HPC) environments. The goal is to contribute to the conversation around RSE career growth and understand their return on investment for scientific projects and sponsors. △ Less

Submitted 6 September, 2022; originally announced September 2022.

Comments: 2 pages, paper presented at the RSE-HPC workshop https://us-rse.org/rse-hpc-2022/ , part of Supercomputing 2022 https://sc22.supercomputing.org/

arXiv:2206.00108 [pdf, other]

doi 10.1109/IPDPSW55747.2022.00153

Modeling pre-Exascale AMR Parallel I/O Workloads via Proxy Applications

Authors: William F Godoy, Jenna Delozier, Gregory R Watson

Abstract: The present work investigates the modeling of pre-exascale input/output (I/O) workloads of Adaptive Mesh Refinement (AMR) simulations through a simple proxy application. We collect data from the AMReX Castro framework running on the Summit supercomputer for a wide range of scales and mesh partitions for the hydrodynamic Sedov case as a baseline to provide sufficient coverage to the formulated prox… ▽ More The present work investigates the modeling of pre-exascale input/output (I/O) workloads of Adaptive Mesh Refinement (AMR) simulations through a simple proxy application. We collect data from the AMReX Castro framework running on the Summit supercomputer for a wide range of scales and mesh partitions for the hydrodynamic Sedov case as a baseline to provide sufficient coverage to the formulated proxy model. The non-linear analysis data production rates are quantified as a function of a set of input parameters such as output frequency, grid size, number of levels, and the Courant-Friedrichs-Lewy (CFL) condition number for each rank, mesh level and simulation time step. Linear regression is then applied to formulate a simple analytical model which allows to translate AMReX inputs into MACSio proxy I/O application parameters, resulting in a simple "kernel" approximation for data production at each time step. Results show that MACSio can simulate actual AMReX non-linear "static" I/O workloads to a certain degree of confidence on the Summit supercomputer using the present methodology. The goal is to provide an initial level of understanding of AMR I/O workloads via lightweight proxy applications models to facilitate autotune data management strategies in anticipation of exascale systems. △ Less

Submitted 31 May, 2022; originally announced June 2022.

Comments: 10 pages, 11 figures, accepted at Seventeenth International Workshop on Automatic Performance Tuning, iWAPT2022, held in conjunction with IEEE IPDPS 2022

arXiv:2204.05896 [pdf, ps, other]

doi 10.1007/978-3-031-08760-8_46

A Survey on Sustainable Software Ecosystems to Support Experimental and Observational Science at Oak Ridge National Laboratory

Authors: David E Bernholdt, Mathieu Doucet, William F Godoy, Addi Malviya-Thakur, Gregory R Watson

Abstract: In the search for a sustainable approach for software ecosystems that supports experimental and observational science (EOS) across Oak Ridge National Laboratory (ORNL), we conducted a survey to understand the current and future landscape of EOS software and data. This paper describes the survey design we used to identify significant areas of interest, gaps, and potential opportunities, followed by… ▽ More In the search for a sustainable approach for software ecosystems that supports experimental and observational science (EOS) across Oak Ridge National Laboratory (ORNL), we conducted a survey to understand the current and future landscape of EOS software and data. This paper describes the survey design we used to identify significant areas of interest, gaps, and potential opportunities, followed by a discussion on the obtained responses. The survey formulates questions about project demographics, technical approach, and skills required for the present and the next five years. The study was conducted among 38 ORNL participants between June and July of 2021 and followed the required guidelines for human subjects training. We plan to use the collected information to help guide a vision for sustainable, community-based, and reusable scientific software ecosystems that need to adapt effectively to: i) the evolving landscape of heterogeneous hardware in the next generation of instruments and computing (e.g. edge, distributed, accelerators), and ii) data management requirements for data-driven science using artificial intelligence. △ Less

Submitted 12 April, 2022; originally announced April 2022.

Comments: 14 pages, no figures, only tables

Journal ref: ICCS 2022, SE4Science Workshop

arXiv:2112.00228 [pdf, other]

doi 10.1109/BigData52589.2021.9671354

Efficient loading of reduced data ensembles produced at ORNL SNS/HFIR neutron time-of-flight facilities

Authors: William F Godoy, Andrei T Savici, Steven E Hahn, Peter F Peterson

Abstract: We present algorithmic improvements to the loading operations of certain reduced data ensembles produced from neutron scattering experiments at Oak Ridge National Laboratory (ORNL) facilities. Ensembles from multiple measurements are required to cover a wide range of the phase space of a sample material of interest. They are stored using the standard NeXus schema on individual HDF5 files. This mak… ▽ More We present algorithmic improvements to the loading operations of certain reduced data ensembles produced from neutron scattering experiments at Oak Ridge National Laboratory (ORNL) facilities. Ensembles from multiple measurements are required to cover a wide range of the phase space of a sample material of interest. They are stored using the standard NeXus schema on individual HDF5 files. This makes it a scalability challenge, as the number of experiments stored increases in a single ensemble file. The present work follows up on our previous efforts on data management algorithms, to address identified input output (I/O) bottlenecks in Mantid, an open-source data analysis framework used across several neutron science facilities around the world. We reuse an in-memory binary-tree metadata index that resembles data access patterns, to provide a scalable search and extraction mechanism. In addition, several memory operations are refactored and optimized for the current common use cases, ranging most frequently from 10 to 180, and up to 360 separate measurement configurations. Results from this work show consistent speed ups in wall-clock time on the Mantid LoadMD routine, ranging from 19\% to 23\% on average, on ORNL production computing systems. The latter depends on the complexity of the targeted instrument-specific data and the system I/O and compute variability for the shared computational resources available to users of ORNL's Spallation Neutron Source (SNS) and the High Flux Isotope Reactor (HFIR) instruments. Nevertheless, we continue to highlight the need for more research to address reduction challenges as experimental data volumes, user time and processing costs increase. △ Less

Submitted 30 November, 2021; originally announced December 2021.

Comments: 7 pages, 6 figures, 4 tables, The Second International Workshop on Big Data Reduction held with 2021 IEEE International Conference on Big Data

arXiv:2107.06108 [pdf]

doi 10.1007/978-3-030-96498-6_6

Transitioning from file-based HPC workflows to streaming data pipelines with openPMD and ADIOS2

Authors: Franz Poeschel, Juncheng E, William F. Godoy, Norbert Podhorszki, Scott Klasky, Greg Eisenhauer, Philip E. Davis, Lipeng Wan, Ana Gainaru, Junmin Gu, Fabian Koller, René Widera, Michael Bussmann, Axel Huebl

Abstract: This paper aims to create a transition path from file-based IO to streaming-based workflows for scientific applications in an HPC environment. By using the openPMP-api, traditional workflows limited by filesystem bottlenecks can be overcome and flexibly extended for in situ analysis. The openPMD-api is a library for the description of scientific data according to the Open Standard for Particle-Mes… ▽ More This paper aims to create a transition path from file-based IO to streaming-based workflows for scientific applications in an HPC environment. By using the openPMP-api, traditional workflows limited by filesystem bottlenecks can be overcome and flexibly extended for in situ analysis. The openPMD-api is a library for the description of scientific data according to the Open Standard for Particle-Mesh Data (openPMD). Its approach towards recent challenges posed by hardware heterogeneity lies in the decoupling of data description in domain sciences, such as plasma physics simulations, from concrete implementations in hardware and IO. The streaming backend is provided by the ADIOS2 framework, developed at Oak Ridge National Laboratory. This paper surveys two openPMD-based loosely-coupled setups to demonstrate flexible applicability and to evaluate performance. In loose coupling, as opposed to tight coupling, two (or more) applications are executed separately, e.g. in individual MPI contexts, yet cooperate by exchanging data. This way, a streaming-based workflow allows for standalone codes instead of tightly-coupled plugins, using a unified streaming-aware API and leveraging high-speed communication infrastructure available in modern compute clusters for massive data exchange. We determine new challenges in resource allocation and in the need of strategies for a flexible data distribution, demonstrating their influence on efficiency and scaling on the Summit compute system. The presented setups show the potential for a more flexible use of compute resources brought by streaming IO as well as the ability to increase throughput by avoiding filesystem bottlenecks. △ Less

Submitted 19 January, 2022; v1 submitted 13 July, 2021; originally announced July 2021.

Comments: 18 pages, 9 figures, SMC2021, supplementary material at https://zenodo.org/record/4906276

arXiv:2102.00507 [pdf, ps, other]

Radiating jump conditions in General Relativity

Authors: L. F Castañeda-Godoy, J. Ospino, L. A. Núñez

Abstract: We present a unified description of spherical discontinuity surfaces in General Relativity based on two parameters: mass function and surface permeability. The surfaces considered are: \textit{Impulsive fronts}, massive permeable layer; \textit{Surface layers}, massive impermeable layer; \textit{Shock fronts}, massless permeable surface; and \textit{Boundary surfaces}, massless impermeable surface… ▽ More We present a unified description of spherical discontinuity surfaces in General Relativity based on two parameters: mass function and surface permeability. The surfaces considered are: \textit{Impulsive fronts}, massive permeable layer; \textit{Surface layers}, massive impermeable layer; \textit{Shock fronts}, massless permeable surface; and \textit{Boundary surfaces}, massless impermeable surface. We derive the exact jump conditions for the physical variables across all these surfaces. Finally, we discuss the quasi-static approximation from studying slow hydrodynamic processes involving discontinuity surfaces. △ Less

Submitted 31 January, 2021; originally announced February 2021.

arXiv:2101.02591 [pdf, other]

doi 10.1109/BigData50022.2020.9377836

Efficient Data Management in Neutron Scattering Data Reduction Workflows at ORNL

Authors: William F Godoy, Peter F Peterson, Steven E Hahn, Jay J Billings

Abstract: Oak Ridge National Laboratory (ORNL) experimental neutron science facilities produce 1.2\,TB a day of raw event-based data that is stored using the standard metadata-rich NeXus schema built on top of the HDF5 file format. Performance of several data reduction workflows is largely determined by the amount of time spent on the loading and processing algorithms in Mantid, an open-source data analysis… ▽ More Oak Ridge National Laboratory (ORNL) experimental neutron science facilities produce 1.2\,TB a day of raw event-based data that is stored using the standard metadata-rich NeXus schema built on top of the HDF5 file format. Performance of several data reduction workflows is largely determined by the amount of time spent on the loading and processing algorithms in Mantid, an open-source data analysis framework used across several neutron sciences facilities around the world. The present work introduces new data management algorithms to address identified input output (I/O) bottlenecks on Mantid. First, we introduce an in-memory binary-tree metadata index that resemble NeXus data access patterns to provide a scalable search and extraction mechanism. Second, data encapsulation in Mantid algorithms is optimally redesigned to reduce the total compute and memory runtime footprint associated with metadata I/O reconstruction tasks. Results from this work show speed ups in wall-clock time on ORNL data reduction workflows, ranging from 11\% to 30\% depending on the complexity of the targeted instrument-specific data. Nevertheless, we highlight the need for more research to address reduction challenges as experimental data volumes increase. △ Less

Submitted 5 January, 2021; originally announced January 2021.

Comments: 7 pages, 4 figures, International Workshop on Big Data Reduction held with 2020 IEEE International Conference on Big Data

arXiv:2011.11773 [pdf, other]

Advancing Humor-Focused Sentiment Analysis through Improved Contextualized Embeddings and Model Architecture

Authors: Felipe Godoy

Abstract: Humor is a natural and fundamental component of human interactions. When correctly applied, humor allows us to express thoughts and feelings conveniently and effectively, increasing interpersonal affection, likeability, and trust. However, understanding the use of humor is a computationally challenging task from the perspective of humor-aware language processing models. As language models become u… ▽ More Humor is a natural and fundamental component of human interactions. When correctly applied, humor allows us to express thoughts and feelings conveniently and effectively, increasing interpersonal affection, likeability, and trust. However, understanding the use of humor is a computationally challenging task from the perspective of humor-aware language processing models. As language models become ubiquitous through virtual-assistants and IOT devices, the need to develop humor-aware models rises exponentially. To further improve the state-of-the-art capacity to perform this particular sentiment-analysis task we must explore models that incorporate contextualized and nonverbal elements in their design. Ideally, we seek architectures accepting non-verbal elements as additional embedded inputs to the model, alongside the original sentence-embedded input. This survey thus analyses the current state of research in techniques for improved contextualized embedding incorporating nonverbal information, as well as newly proposed deep architectures to improve context retention on top of popular word-embeddings methods. △ Less

Submitted 23 November, 2020; originally announced November 2020.

arXiv:2009.14778 [pdf]

doi 10.1016/j.jallcom.2020.158320

Effects of Reducing Heat Treatment on the Structural and the Magnetic Properties of Mn:ZnO Ceramics

Authors: V. M. Almeida Lage, R. T. da Silva, A. Mesquita, M. P. F. de Godoy, X. Gratens, d V. A. Chitta, H. B. de Carvalho

Abstract: Polycrystalline bulk Mn:ZnO ceramics with Mn nominal concentrations of 6, 11, 17 and 22 at.% were prepared trough solid-state reaction method and subjected to a heat treatment in reducing atmosphere (Ar (95%) and H2 (5%)). The samples were studied with particular emphasis on their compositions, structural, and magnetic properties. A detailed microstructural and chemical analysis confirms the Mn do… ▽ More Polycrystalline bulk Mn:ZnO ceramics with Mn nominal concentrations of 6, 11, 17 and 22 at.% were prepared trough solid-state reaction method and subjected to a heat treatment in reducing atmosphere (Ar (95%) and H2 (5%)). The samples were studied with particular emphasis on their compositions, structural, and magnetic properties. A detailed microstructural and chemical analysis confirms the Mn doping of the wurtzite ZnO structure mainly at the surface of the ZnO grains. For the samples with higher Mn content, the secondary phases ZnMn2O4 and Mn1-xZnxO (Zn-doped MnO) were detected for the as prepared and the heat treated samples, respectively. The structural change of the secondary phases under heat treatment, from ZnMn2O4 to Mn1-xZnxO, confirms the effectiveness of the heat treatment in to reduce the valence of the metallic ions and in the formation of oxygen vacancies into the system. In spite of the induced defects, the magnetic analysis present only a paramagnetic behavior with an antiferromagnetic coupling between the Mn ions. In the context of the bound magnetic polaron theory, it is concluded that oxygen vacancies are not the necessary defect to promote the desired ferromagnetic order at room temperature. △ Less

Submitted 30 September, 2020; originally announced September 2020.

Comments: 19 pages; 7 figures

Journal ref: Journal of Alloys and Compounds 863 (2021) 158320

arXiv:2007.14140 [pdf]

doi 10.1016/j.jallcom.2020.157772

Defect Induced Room Temperature Ferromagnetism in High Quality Co-doped ZnO Bulk Samples

Authors: M. P. F. de Godoy, X. Gratens, V. A. Chitta, A. Mesquita, M. M de Lima Jr., A. Cantarero, G. Rahman, J. M. Morbec, H. B. de Carvalho

Abstract: The nature of the often reported room temperature ferromagnetism in transition metal doped oxides is still a matter of huge debate. Herein we report on room temperature ferromagnetism in high quality Co-doped ZnO (Zn1-xCoxO) bulk samples synthesized via standard solid-state reaction route. Reference paramagnetic Co-doped ZnO samples with low level of structural defects are subjected to heat treatm… ▽ More The nature of the often reported room temperature ferromagnetism in transition metal doped oxides is still a matter of huge debate. Herein we report on room temperature ferromagnetism in high quality Co-doped ZnO (Zn1-xCoxO) bulk samples synthesized via standard solid-state reaction route. Reference paramagnetic Co-doped ZnO samples with low level of structural defects are subjected to heat treatments in a reductive atmosphere in order to introduce defects in the samples in a controlled way. A detailed structural analysis is carried out in order to characterize the induced defects and their concentration. The magnetometry revealed the coexistence of a paramagnetic and a ferromagnetic phase at room temperature in straight correlation with the structural properties. The saturation magnetization is found to increase with the intensification of the heat treatment, and, therefore, with the increase of the density of induced defects. The magnetic behavior is fully explained in terms of the bound magnetic polaron model. Based on the experimental findings, supported by theoretical calculations, we attribute the origin of the observed defect-induced-ferromagnetism to the ferromagnetic coupling between the Co ions mediated by magnetic polarons due to zinc interstitial defects. △ Less

Submitted 28 July, 2020; originally announced July 2020.

Comments: 33 pages, 9 figures

Journal ref: Journal of Alloys and Compounds 859 (2021) 157772

arXiv:1703.07383 [pdf, ps, other]

doi 10.1016/j.cnsns.2017.10.018

Mathematical Model with Autoregressive Process for Electrocardiogram Signals

Authors: Ronaldo M Evaristo, Antonio M Batista, Ricardo L Viana, Kelly C Iarosz, José D Szezech Jr, Moacir F de Godoy

Abstract: The cardiovascular system is composed of the heart, blood and blood vessels. Regarding the heart, cardiac conditions are determined by the electrocardiogram, that is a noninvasive medical procedure. In this work, we propose autoregressive process in a mathematical model based on coupled differential equations in order to model electrocardiogram signals. Our results are compared with experimental t… ▽ More The cardiovascular system is composed of the heart, blood and blood vessels. Regarding the heart, cardiac conditions are determined by the electrocardiogram, that is a noninvasive medical procedure. In this work, we propose autoregressive process in a mathematical model based on coupled differential equations in order to model electrocardiogram signals. Our results are compared with experimental tachogram by means of Poincaré plot and dentrended fluctuation analysis. We verify that the results from the model with autoregressive process show good agreement with experimental measures from tachogram generated by electrical activity of the heartbeat. With the tachogram we build the electrocardiogram by means of coupled differential equations. △ Less

Submitted 6 November, 2017; v1 submitted 21 March, 2017; originally announced March 2017.

Journal ref: Communications in Nonlinear Science and Numerical Simulation, Volume 57, Pages 415-421, 2018

Showing 1–26 of 26 results for author: Godoy, F