Skip to main content

Showing 1–15 of 15 results for author: Laguna, I

.
  1. arXiv:2410.09191  [pdf, other

    cs.SE cs.PF cs.PL

    Testing the Unknown: A Framework for OpenMP Testing via Random Program Generation

    Authors: Ignacio Laguna, Patrick Chapman, Konstantinos Parasyris, Giorgis Georgakoudis, Cindy Rubio-González

    Abstract: We present a randomized differential testing approach to test OpenMP implementations. In contrast to previous work that manually creates dozens of verification and validation tests, our approach is able to randomly generate thousands of tests, exposing OpenMP implementations to a wide range of program behaviors. We represent the space of possible random OpenMP tests using a grammar and implement o… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  2. arXiv:2410.09172  [pdf, other

    math.NA cs.PL

    Testing GPU Numerics: Finding Numerical Differences Between NVIDIA and AMD GPUs

    Authors: Anwar Hossain Zahid, Ignacio Laguna, Wei Le

    Abstract: As scientific codes are ported between GPU platforms, continuous testing is required to ensure numerical robustness and identify numerical differences. Compiler-induced numerical differences occur when a program is compiled and run on different GPUs, and the numerical outcomes are different for the same input. We present a study of compiler-induced numerical differences between NVIDIA and AMD GPUs… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  3. arXiv:2403.00232  [pdf, other

    cs.AR

    FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators

    Authors: Xinyi Li, Ang Li, Bo Fang, Katarzyna Swirydowicz, Ignacio Laguna, Ganesh Gopalakrishnan

    Abstract: NVIDIA Tensor Cores and AMD Matrix Cores (together called Matrix Accelerators) are of growing interest in high-performance computing and machine learning owing to their high performance. Unfortunately, their numerical behaviors are not publicly documented, including the number of extra precision bits maintained, the accumulation order of addition, and predictable subnormal number handling during c… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  4. arXiv:2311.05782  [pdf, other

    cs.DC

    MPGemmFI: A Fault Injection Technique for Mixed Precision GEMM in ML Applications

    Authors: Bo Fang, Xinyi Li, Harvey Dam, Cheng Tan, Siva Kumar Sastry Hari, Timothy Tsai, Ignacio Laguna, Dingwen Tao, Ganesh Gopalakrishnan, Prashant Nair, Kevin Barker, Ang Li

    Abstract: Emerging deep learning workloads urgently need fast general matrix multiplication (GEMM). To meet such demand, one of the critical features of machine-learning-specific accelerators such as NVIDIA Tensor Cores, AMD Matrix Cores, and Google TPUs is the support of mixed-precision enabled GEMM. For DNN models, lower-precision FP data formats and computation offer acceptable correctness but significan… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  5. Giving RSEs a Larger Stage through the Better Scientific Software Fellowship

    Authors: William F. Godoy, Ritu Arora, Keith Beattie, David E. Bernholdt, Sarah E. Bratt, Daniel S. Katz, Ignacio Laguna, Amiya K. Maji, Addi Malviya Thakur, Rafael M. Mudafort, Nitin Sukhija, Damian Rouson, Cindy Rubio-González, Karan Vahi

    Abstract: The Better Scientific Software Fellowship (BSSwF) was launched in 2018 to foster and promote practices, processes, and tools to improve developer productivity and software sustainability of scientific codes. BSSwF's vision is to grow the community with practitioners, leaders, mentors, and consultants to increase the visibility of scientific software production and sustainability. Over the last fiv… ▽ More

    Submitted 14 November, 2022; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: submitted to Computing in Science & Engineering (CiSE), Special Issue on the Future of Research Software Engineers in the US

  6. arXiv:2102.06896  [pdf, other

    cs.DC

    Reinit++: Evaluating the Performance of Global-Restart Recovery Methods For MPI Fault Tolerance

    Authors: Giorgis Georgakoudis, Luanzheng Guo, Ignacio Laguna

    Abstract: Scaling supercomputers comes with an increase in failure rates due to the increasing number of hardware components. In standard practice, applications are made resilient through checkpointing data and restarting execution after a failure occurs to resume from the latest check-point. However, re-deploying an application incurs overhead by tearing down and re-instating execution, and possibly limiti… ▽ More

    Submitted 13 February, 2021; originally announced February 2021.

    Comments: International Conference on High Performance Computing (ISC 2020)

  7. arXiv:2102.06894  [pdf, other

    cs.DC

    MATCH: An MPI Fault Tolerance Benchmark Suite

    Authors: Luanzheng Guo, Giorgis Georgakoudis, Konstantinos Parasyris, Ignacio Laguna, Dong Li

    Abstract: MPI has been ubiquitously deployed in flagship HPC systems aiming to accelerate distributed scientific applications running on tens of hundreds of processes and compute nodes. Maintaining the correctness and integrity of MPI application execution is critical, especially for safety-critical scientific applications. Therefore, a collection of effective MPI fault tolerance techniques have been propos… ▽ More

    Submitted 13 February, 2021; originally announced February 2021.

    Journal ref: IEEE International Symposium on Workload Characterization (IISWC 2020)

  8. arXiv:2102.01687  [pdf, other

    cs.LG cs.AI cs.PL cs.SE

    Report of the Workshop on Program Synthesis for Scientific Computing

    Authors: Hal Finkel, Ignacio Laguna

    Abstract: Program synthesis is an active research field in academia, national labs, and industry. Yet, work directly applicable to scientific computing, while having some impressive successes, has been limited. This report reviews the relevant areas of program synthesis work for scientific computing, discusses successes to date, and outlines opportunities for future work. This report is the result of the Wo… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

    Comments: 29 pages, workshop website: https://prog-synth-science.github.io/2020/

  9. arXiv:1812.02944  [pdf, other

    cs.DC

    PARIS: Predicting Application Resilience Using Machine Learning

    Authors: Luanzheng Guo, Dong Li, Ignacio Laguna

    Abstract: Extreme-scale scientific applications can be more vulnerable to soft errors (transient faults) as high-performance computing systems increase in scale. The common practice to evaluate the resilience to faults of an application is random fault injection, a method that can be highly time consuming. While resilience prediction modeling has been recently proposed to predict application resilience in a… ▽ More

    Submitted 7 December, 2018; originally announced December 2018.

  10. Multi-level analysis of compiler induced variability and performance tradeoffs

    Authors: Michael Bentley, Ian Briggs, Ganesh Gopalakrishnan, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Holger E. Jones

    Abstract: Successful HPC software applications are long-lived. When ported across machines and their compilers, these applications often produce different numerical results, many of which are unacceptable. Such variability is also a concern while optimizing the code more aggressively to gain performance. Efficient tools that help locate the program units (files and functions) within which most of the variab… ▽ More

    Submitted 24 June, 2019; v1 submitted 13 November, 2018; originally announced November 2018.

    Comments: 12 pages, 11 figures, accepted in HPDC 2019

    Report number: LLNL-CONF-759867

  11. arXiv:1809.01362  [pdf, other

    cs.DC

    FlipTracker: Understanding Natural Error Resilience in HPC Applications

    Authors: Luanzheng Guo, Dong Li, Ignacio Laguna, Martin Schulz

    Abstract: As high-performance computing systems scale in size and computational power, the danger of silent errors, i.e., errors that can bypass hardware detection mechanisms and impact application state, grows dramatically. Consequently, applications running on HPC systems need to exhibit resilience to such errors. Previous work has found that, for certain codes, this resilience can come for free, i.e., so… ▽ More

    Submitted 5 September, 2018; originally announced September 2018.

    Report number: LLNL-CONF-748619

  12. arXiv:1705.07478  [pdf, other

    cs.DC

    Report of the HPC Correctness Summit, Jan 25--26, 2017, Washington, DC

    Authors: Ganesh Gopalakrishnan, Paul D. Hovland, Costin Iancu, Sriram Krishnamoorthy, Ignacio Laguna, Richard A. Lethin, Koushik Sen, Stephen F. Siegel, Armando Solar-Lezama

    Abstract: Maintaining leadership in HPC requires the ability to support simulations at large scales and fidelity. In this study, we detail one of the most significant productivity challenges in achieving this goal, namely the increasing proclivity to bugs, especially in the face of growing hardware and software heterogeneity and sheer system scale. We identify key areas where timely new research must be pro… ▽ More

    Submitted 21 May, 2017; originally announced May 2017.

    Comments: 57 pages

  13. arXiv:1611.00037  [pdf, other

    astro-ph.IM astro-ph.CO

    The DESI Experiment Part II: Instrument Design

    Authors: DESI Collaboration, Amir Aghamousa, Jessica Aguilar, Steve Ahlen, Shadab Alam, Lori E. Allen, Carlos Allende Prieto, James Annis, Stephen Bailey, Christophe Balland, Otger Ballester, Charles Baltay, Lucas Beaufore, Chris Bebek, Timothy C. Beers, Eric F. Bell, José Luis Bernal, Robert Besuner, Florian Beutler, Chris Blake, Hannes Bleuler, Michael Blomqvist, Robert Blum, Adam S. Bolton, Cesar Briceno , et al. (268 additional authors not shown)

    Abstract: DESI (Dark Energy Spectropic Instrument) is a Stage IV ground-based dark energy experiment that will study baryon acoustic oscillations and the growth of structure through redshift-space distortions with a wide-area galaxy and quasar redshift survey. The DESI instrument is a robotically-actuated, fiber-fed spectrograph capable of taking up to 5,000 simultaneous spectra over a wavelength range from… ▽ More

    Submitted 13 December, 2016; v1 submitted 31 October, 2016; originally announced November 2016.

  14. arXiv:1611.00036  [pdf, other

    astro-ph.IM astro-ph.CO

    The DESI Experiment Part I: Science,Targeting, and Survey Design

    Authors: DESI Collaboration, Amir Aghamousa, Jessica Aguilar, Steve Ahlen, Shadab Alam, Lori E. Allen, Carlos Allende Prieto, James Annis, Stephen Bailey, Christophe Balland, Otger Ballester, Charles Baltay, Lucas Beaufore, Chris Bebek, Timothy C. Beers, Eric F. Bell, José Luis Bernal, Robert Besuner, Florian Beutler, Chris Blake, Hannes Bleuler, Michael Blomqvist, Robert Blum, Adam S. Bolton, Cesar Briceno , et al. (268 additional authors not shown)

    Abstract: DESI (Dark Energy Spectroscopic Instrument) is a Stage IV ground-based dark energy experiment that will study baryon acoustic oscillations (BAO) and the growth of structure through redshift-space distortions with a wide-area galaxy and quasar redshift survey. To trace the underlying dark matter distribution, spectroscopic targets will be selected in four classes from imaging data. We will measure… ▽ More

    Submitted 13 December, 2016; v1 submitted 31 October, 2016; originally announced November 2016.

  15. arXiv:1509.00548  [pdf, other

    astro-ph.IM

    Development of the photomultiplier tube readout system for the first Large-Sized Telescope of the Cherenkov Telescope Array

    Authors: Shu Masuda, Yusuke Konno, Juan Abel Barrio, Oscar Blanch Bigas, Carlos Delgado, Lluís Freixas Coromina, Shuichi Gunji, Daniela Hadasch, Kenichiro Hatanaka, Masahiro Ikeno, Jose Maria Illa Laguna, Yusuke Inome, Kazuma Ishio, Hideaki Katagiri, Hidetoshi Kubo, Gustavo Martínez, Daniel Mazin, Daisuke Nakajima, Takeshi Nakamori, Hideyuki Ohoka, Riccardo Paoletti, Stefan Ritt, Andrea Rugliancich, Takayuki Saito, Karl-Heinz Sulanke , et al. (9 additional authors not shown)

    Abstract: The Cherenkov Telescope Array (CTA) is the next generation ground-based very high energy gamma-ray observatory. The Large-Sized Telescope (LST) of CTA targets 20 GeV -- 1 TeV gamma rays and has 1855 photomultiplier tubes (PMTs) installed in the focal plane camera. With the 23 m mirror dish, the night sky background (NSB) rate amounts to several hundreds MHz per pixel. In order to record clean imag… ▽ More

    Submitted 1 September, 2015; originally announced September 2015.

    Comments: In Proceedings of the 34th International Cosmic Ray Conference (ICRC2015), The Hague, The Netherlands. All CTA contributions at arXiv:1508.05894