-
A Framework to capture and reproduce the Absolute State of Jupyter Notebooks
Authors:
Dimuthu Wannipurage,
Suresh Marru,
Marlon Pierce
Abstract:
Jupyter Notebooks are an enormously popular tool for creating and narrating computational research projects. They also have enormous potential for creating reproducible scientific research artifacts. Capturing the complete state of a notebook has additional benefits; for instance, the notebook execution may be split between local and remote resources, where the latter may have more powerful proces…
▽ More
Jupyter Notebooks are an enormously popular tool for creating and narrating computational research projects. They also have enormous potential for creating reproducible scientific research artifacts. Capturing the complete state of a notebook has additional benefits; for instance, the notebook execution may be split between local and remote resources, where the latter may have more powerful processing capabilities or store large or access-limited data. There are several challenges for making notebooks fully reproducible when examined in detail. The notebook code must be replicated entirely, and the underlying Python runtime environments must be identical. More subtle problems arise in replicating referenced data, external library dependencies, and runtime variable states. This paper presents solutions to these problems using Juptyer's standard extension mechanisms to create an archivable system state for a running notebook. We show that the overhead for these additional mechanisms, which involve interacting with the underlying Linux kernel, does not introduce substantial execution time overheads, demonstrating the approach's feasibility.
△ Less
Submitted 31 March, 2022;
originally announced April 2022.
-
Experiences with managing data parallel computational workflows for High-throughput Fragment Molecular Orbital (FMO) Calculations
Authors:
Dimuthu Wannipurage,
Indrajit Deb,
Eroma Abeysinghe,
Sudhakar Pamidighantam,
Suresh Marru,
Marlon Pierce,
Aaron T. Frank
Abstract:
Fragment Molecular Orbital (FMO) calculations provide a framework to speed up quantum mechanical calculations and so can be used to explore structure-energy relationships in large and complex biomolecular systems. These calculations are still onerous, especially when applied to large sets of molecules. Therefore, cyberinfrastructure that provides mechanisms and user interfaces that manage job subm…
▽ More
Fragment Molecular Orbital (FMO) calculations provide a framework to speed up quantum mechanical calculations and so can be used to explore structure-energy relationships in large and complex biomolecular systems. These calculations are still onerous, especially when applied to large sets of molecules. Therefore, cyberinfrastructure that provides mechanisms and user interfaces that manage job submissions, failed job resubmissions, data retrieval, and data storage for these calculations are needed. Motivated by the need to rapidly identify drugs that are likely to bind to targets implicated in SARS-CoV-2, the virus that causes COVID-19, we developed a static parameter sweeping framework with Apache Airavata middleware to apply to complexes formed between SARS-CoV-2 M-pro (the main protease in SARS-CoV-2) and 2820 small-molecules in a drug-repurposing library. Here we describe the implementation of our framework for managing the executions of the high-throughput FMO calculations. The approach is general and so should find utility in large-scale FMO calculations on biomolecular systems.
△ Less
Submitted 28 January, 2022;
originally announced January 2022.
-
Experiences with Integrating Custos SecurityServices
Authors:
Isuru Ranawaka,
Samitha Liyanage,
Dannon Baker,
Alexandru Mahmoud,
Juleen Graham,
Terry Fleury,
Dimuthu Wannipurage,
Yu Ma,
Enis Afgan,
Jim Basney,
Suresh Marru,
Marlon Pierce
Abstract:
Science gateways are user-facing cyberinfrastruc-ture that provide researchers and educators with Web-basedaccess to scientific software, computing, and data resources.Managing user identities, accounts, and permissions are essentialtasks for science gateways, and gateways likewise must man-age secure connections between their middleware and remoteresources. The Custos project is an effort to buil…
▽ More
Science gateways are user-facing cyberinfrastruc-ture that provide researchers and educators with Web-basedaccess to scientific software, computing, and data resources.Managing user identities, accounts, and permissions are essentialtasks for science gateways, and gateways likewise must man-age secure connections between their middleware and remoteresources. The Custos project is an effort to build open sourcesoftware that can be operated as a multi-tenanted service thatprovides reliable implementations of common science gatewaycybersecurity needs, including federated authentication, iden-tity management, group and authorization management, andresource credential management. Custos aims further to provideintegrated solutions through these capabilities, delivering end-to-end support for several science gateway usage scenarios. Thispaper examines four deployment scenarios using Custos andassociated extensions beyond previously described work. Thefirst capability illustrated by these scenarios is the need forCustos to provide hierarchical tenant management that allowsmultiple gateway deployments to be federated together andalso to support consolidated, hosted science gateway platformservices. The second capability illustrated by these scenarios is theneed to support service accounts that can support non-browserapplications and agent applications that can act on behalf ofusers on edge resources. We illustrate how the latter can be builtusing Web security standards combined with Custos permissionmanagement mechanisms.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
A Multi-Protocol, Secure, and Dynamic Data Storage Integration Frameworkfor Multi-tenanted Science Gateway Middleware
Authors:
Dimuthu Wannipurage,
Isuru Ranawaka,
Eroma Abeysinghe,
Marcus Christie,
Suresh Marru,
Marlon Pierce
Abstract:
Science gateways are user-centric, end-to-end cyberinfrastructure for managing scientific data and executions of computational software on distributed resources. In order to simplify the creation and management of science gateways, we have pursued a multi-tenanted, platform-as-a-service approach that allows multiple gateway front-ends (portals) to be integrated with a consolidated middleware that…
▽ More
Science gateways are user-centric, end-to-end cyberinfrastructure for managing scientific data and executions of computational software on distributed resources. In order to simplify the creation and management of science gateways, we have pursued a multi-tenanted, platform-as-a-service approach that allows multiple gateway front-ends (portals) to be integrated with a consolidated middleware that manages the movement of data and the execution of workflows on multiple back-end scientific computing resources. An important challenge for this approach is to provide an end-to-end data movement and management solution that allows gateway users to integrate their own data stores with the gateway platform. These user-provided data stores may include commercial cloud-based object store systems, third-party data stores accessed through APIs such as REST endpoints, and users' own local storage resources. In this paper, we present a solution design and implementation based on the integration of a managed file transfer (MFT) service (Airavata MFT) into the platform.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Toward Interoperable Cyberinfrastructure: Common Descriptions for Computational Resources and Applications
Authors:
Joe Stubbs,
Suresh Marru,
Daniel Mejia,
Daniel S. Katz,
Kyle Chard,
Maytal Dahan,
Marlon Pierce,
Michael Zentner
Abstract:
The user-facing components of the Cyberinfrastructure (CI) ecosystem, science gateways and scientific workflow systems, share a common need of interfacing with physical resources (storage systems and execution environments) to manage data and execute codes (applications). However, there is no uniform, platform-independent way to describe either the resources or the applications. To address this, w…
▽ More
The user-facing components of the Cyberinfrastructure (CI) ecosystem, science gateways and scientific workflow systems, share a common need of interfacing with physical resources (storage systems and execution environments) to manage data and execute codes (applications). However, there is no uniform, platform-independent way to describe either the resources or the applications. To address this, we propose uniform semantics for describing resources and applications that will be relevant to a diverse set of stakeholders. We sketch a solution to the problem of a common description and catalog of resources: we describe an approach to implementing a resource registry for use by the community and discuss potential approaches to some long-term challenges. We conclude by looking ahead to the application description language.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
MultiCloud Resource Management using Apache Mesos for Planned Integration with Apache Airavata
Authors:
Pankaj Saha,
Madhusudhan Govindaraju,
Suresh Marru,
Marlon Pierce
Abstract:
We discuss initial results and our planned approach for incorporating Apache Mesos based resource management that will enable design and development of scheduling strategies for Apache Airavata jobs so that they can be launched on multiple clouds, wherein several VMs do not have Public IP addresses. We present initial work and next steps on the design of a meta-scheduler using Apache Mesos. Apache…
▽ More
We discuss initial results and our planned approach for incorporating Apache Mesos based resource management that will enable design and development of scheduling strategies for Apache Airavata jobs so that they can be launched on multiple clouds, wherein several VMs do not have Public IP addresses. We present initial work and next steps on the design of a meta-scheduler using Apache Mesos. Apache Mesos presents a unified view of resources available across several clouds and clusters. Our meta-scheduler can potentially examine and identify the cases where multiple small jobs have been submitted by the same scientists and then redirect job from the same community account or user to different clusters. Our approach uses a NAT firewall to make nodes/VMs, without a Public IP, visible to Mesos for the unified view.
△ Less
Submitted 5 October, 2020; v1 submitted 17 June, 2019;
originally announced June 2019.
-
Report on the Third Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE3)
Authors:
Daniel S. Katz,
Sou-Cheng T. Choi,
Kyle E. Niemeyer,
James Hetherington,
Frank Löffler,
Dan Gunter,
Ray Idaszak,
Steven R. Brandt,
Mark A. Miller,
Sandra Gesing,
Nick D. Jones,
Nic Weber,
Suresh Marru,
Gabrielle Allen,
Birgit Penzenstadler,
Colin C. Venters,
Ethan Davis,
Lorraine Hwang,
Ilian Todorov,
Abani Patra,
Miguel de Val-Borro
Abstract:
This report records and discusses the Third Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE3). The report includes a description of the keynote presentation of the workshop, which served as an overview of sustainable scientific software. It also summarizes a set of lightning talks in which speakers highlighted to-the-point lessons and challenges pertaining to sustain…
▽ More
This report records and discusses the Third Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE3). The report includes a description of the keynote presentation of the workshop, which served as an overview of sustainable scientific software. It also summarizes a set of lightning talks in which speakers highlighted to-the-point lessons and challenges pertaining to sustaining scientific software. The final and main contribution of the report is a summary of the discussions, future steps, and future organization for a set of self-organized working groups on topics including developing pathways to funding scientific software; constructing useful common metrics for crediting software stakeholders; identifying principles for sustainable software engineering design; reaching out to research software organizations around the world; and building communities for software sustainability. For each group, we include a point of contact and a landing page that can be used by those who want to join that group's future activities. The main challenge left by the workshop is to see if the groups will execute these activities that they have scheduled, and how the WSSSPE community can encourage this to happen.
△ Less
Submitted 6 February, 2016;
originally announced February 2016.