-
Collaborative Experience between Scientific Software Projects using Agile Scrum Development
Authors:
A. L. Baxter,
S. Y. BenZvi,
W. Bonivento,
A. Brazier,
M. Clark,
A. Coleiro,
D. Collom,
M. Colomer-Molla,
B. Cousins,
A. Delgado Orellana,
D. Dornic,
V. Ekimtcov,
S. ElSayed,
A. Gallo Rosso,
P. Godwin,
S. Griswold,
A. Habig,
S. Horiuchi,
D. A. Howell,
M. W. G. Johnson,
M. Juric,
J. P. Kneller,
A. Kopec,
C. Kopper,
V. Kulikovskiy
, et al. (27 additional authors not shown)
Abstract:
Developing sustainable software for the scientific community requires expertise in software engineering and domain science. This can be challenging due to the unique needs of scientific software, the insufficient resources for software engineering practices in the scientific community, and the complexity of developing for evolving scientific contexts. While open-source software can partially addre…
▽ More
Developing sustainable software for the scientific community requires expertise in software engineering and domain science. This can be challenging due to the unique needs of scientific software, the insufficient resources for software engineering practices in the scientific community, and the complexity of developing for evolving scientific contexts. While open-source software can partially address these concerns, it can introduce complicating dependencies and delay development. These issues can be reduced if scientists and software developers collaborate. We present a case study wherein scientists from the SuperNova Early Warning System collaborated with software developers from the Scalable Cyberinfrastructure for Multi-Messenger Astrophysics project. The collaboration addressed the difficulties of open-source software development, but presented additional risks to each team. For the scientists, there was a concern of relying on external systems and lacking control in the development process. For the developers, there was a risk in supporting a user-group while maintaining core development. These issues were mitigated by creating a second Agile Scrum framework in parallel with the developers' ongoing Agile Scrum process. This Agile collaboration promoted communication, ensured that the scientists had an active role in development, and allowed the developers to evaluate and implement the scientists' software requirements. The collaboration provided benefits for each group: the scientists actuated their development by using an existing platform, and the developers utilized the scientists' use-case to improve their systems. This case study suggests that scientists and software developers can avoid scientific computing issues by collaborating and that Agile Scrum methods can address emergent concerns.
△ Less
Submitted 2 August, 2022; v1 submitted 19 January, 2021;
originally announced January 2021.
-
Checkpoint, Restore, and Live Migration for Science Platforms
Authors:
Mario Juric,
Steven Stetzler,
Colin T. Slater
Abstract:
We demonstrate a fully functional implementation of (per-user) checkpoint, restore, and live migration capabilities for JupyterHub platforms. Checkpointing -- the ability to freeze and suspend to disk the running state (contents of memory, registers, open files, etc.) of a set of processes -- enables the system to snapshot a user's Jupyter session to permanent storage. The restore functionality br…
▽ More
We demonstrate a fully functional implementation of (per-user) checkpoint, restore, and live migration capabilities for JupyterHub platforms. Checkpointing -- the ability to freeze and suspend to disk the running state (contents of memory, registers, open files, etc.) of a set of processes -- enables the system to snapshot a user's Jupyter session to permanent storage. The restore functionality brings a checkpointed session back to a running state, to continue where it left off at a later time and potentially on a different machine. Finally, live migration enables moving running Jupyter notebook servers between different machines, transparent to the analysis code and w/o disconnecting the user. Our implementation of these capabilities works at the system level, with few limitations, and typical checkpoint/restore times of O(10s) with a pathway to O(1s) live migrations. It opens a myriad of interesting use cases, especially for cloud-based deployments: from checkpointing idle sessions w/o interruption of the user's work (achieving cost reductions of 4x or more), execution on spot instances w. transparent migration on eviction (with additional cost reductions up to 3x), to automated migration of workloads to ideally suited instances (e.g. moving an analysis to a machine with more or less RAM or cores based on observed resource utilization). The capabilities we demonstrate can make science platforms fully elastic while retaining excellent user experience.
△ Less
Submitted 14 January, 2021;
originally announced January 2021.
-
AI safety: state of the field through quantitative lens
Authors:
Mislav Juric,
Agneza Sandic,
Mario Brcic
Abstract:
Last decade has seen major improvements in the performance of artificial intelligence which has driven wide-spread applications. Unforeseen effects of such mass-adoption has put the notion of AI safety into the public eye. AI safety is a relatively new field of research focused on techniques for building AI beneficial for humans. While there exist survey papers for the field of AI safety, there is…
▽ More
Last decade has seen major improvements in the performance of artificial intelligence which has driven wide-spread applications. Unforeseen effects of such mass-adoption has put the notion of AI safety into the public eye. AI safety is a relatively new field of research focused on techniques for building AI beneficial for humans. While there exist survey papers for the field of AI safety, there is a lack of a quantitative look at the research being conducted. The quantitative aspect gives a data-driven insight about the emerging trends, knowledge gaps and potential areas for future research. In this paper, bibliometric analysis of the literature finds significant increase in research activity since 2015. Also, the field is so new that most of the technical issues are open, including: explainability with its long-term utility, and value alignment which we have identified as the most important long-term research topic. Equally, there is a severe lack of research into concrete policies regarding AI. As we expect AI to be the one of the main driving forces of changes in society, AI safety is the field under which we need to decide the direction of humanity's future.
△ Less
Submitted 9 July, 2020; v1 submitted 12 February, 2020;
originally announced February 2020.
-
AXS: A framework for fast astronomical data processing based on Apache Spark
Authors:
Petar Zečević,
Colin T. Slater,
Mario Jurić,
Andrew J. Connolly,
Sven Lončarić,
Eric C. Bellm,
V. Zach Golkhou,
Krzysztof Suberlak
Abstract:
We introduce AXS (Astronomy eXtensions for Spark), a scalable open-source astronomical data analysis framework built on Apache Spark, a widely used industry-standard engine for big data processing. Building on capabilities present in Spark, AXS aims to enable querying and analyzing almost arbitrarily large astronomical catalogs using familiar Python/AstroPy concepts, DataFrame APIs, and SQL statem…
▽ More
We introduce AXS (Astronomy eXtensions for Spark), a scalable open-source astronomical data analysis framework built on Apache Spark, a widely used industry-standard engine for big data processing. Building on capabilities present in Spark, AXS aims to enable querying and analyzing almost arbitrarily large astronomical catalogs using familiar Python/AstroPy concepts, DataFrame APIs, and SQL statements. We achieve this by i) adding support to Spark for efficient on-line positional cross-matching and ii) supplying a Python library supporting commonly-used operations for astronomical data analysis. To support scalable cross-matching, we developed a variant of the ZONES algorithm (Gray et al. 2004) capable of operating in distributed, shared-nothing architecture. We couple this to a data partitioning scheme that enables fast catalog cross-matching and handles the data skew often present in deep all-sky data sets. The cross-match and other often-used functionalities are exposed to the end users through an easy-to-use Python API. We demonstrate AXS' technical and scientific performance on SDSS, ZTF, Gaia DR2, and AllWise catalogs. Using AXS we were able to perform on-the-fly cross-match of Gaia DR2 (1.8 billion rows) and AllWise (900 million rows) data sets in ~ 30 seconds. We discuss how cloud-ready distributed systems like AXS provide a natural way to enable comprehensive end-user analyses of large datasets such as LSST.
△ Less
Submitted 24 May, 2019; v1 submitted 22 May, 2019;
originally announced May 2019.
-
Swarm-NG: a CUDA Library for Parallel n-body Integrations with focus on Simulations of Planetary Systems
Authors:
Saleh Dindar,
Eric B. Ford,
Mario Juric,
Young In Yeo,
Jianwei Gao,
Aaron C. Boley,
Benjamin Nelson,
Jorg Peters
Abstract:
We present Swarm-NG, a C++ library for the efficient direct integration of many n-body systems using highly-parallel Graphics Processing Unit (GPU), such as NVIDIA's Tesla T10 and M2070 GPUs. While previous studies have demonstrated the benefit of GPUs for n-body simulations with thousands to millions of bodies, Swarm-NG focuses on many few-body systems, e.g., thousands of systems with 3...15 bodi…
▽ More
We present Swarm-NG, a C++ library for the efficient direct integration of many n-body systems using highly-parallel Graphics Processing Unit (GPU), such as NVIDIA's Tesla T10 and M2070 GPUs. While previous studies have demonstrated the benefit of GPUs for n-body simulations with thousands to millions of bodies, Swarm-NG focuses on many few-body systems, e.g., thousands of systems with 3...15 bodies each, as is typical for the study of planetary systems. Swarm-NG parallelizes the simulation, including both the numerical integration of the equations of motion and the evaluation of forces using NVIDIA's "Compute Unified Device Architecture" (CUDA) on the GPU. Swarm-NG includes optimized implementations of 4th order time-symmetrized Hermite integration and mixed variable symplectic integration, as well as several sample codes for other algorithms to illustrate how non-CUDA-savvy users may themselves introduce customized integrators into the Swarm-NG framework. To optimize performance, we analyze the effect of GPU-specific parameters on performance under double precision.
Applications of Swarm-NG include studying the late stages of planet formation, testing the stability of planetary systems and evaluating the goodness-of-fit between many planetary system models and observations of extrasolar planet host stars (e.g., radial velocity, astrometry, transit timing). While Swarm-NG focuses on the parallel integration of many planetary systems,the underlying integrators could be applied to a wide variety of problems that require repeatedly integrating a set of ordinary differential equations many times using different initial conditions and/or parameter values.
△ Less
Submitted 24 September, 2012; v1 submitted 6 August, 2012;
originally announced August 2012.