-
A Comparison-Relationship-Surrogate Evolutionary Algorithm for Multi-Objective Optimization
Authors:
Christopher M. Pierce,
Young-Kee Kim,
Ivan Bazarov
Abstract:
Evolutionary algorithms often struggle to find well converged (e.g small inverted generational distance on test problems) solutions to multi-objective optimization problems on a limited budget of function evaluations (here, a few hundred). The family of surrogate-assisted evolutionary algorithms (SAEAs) offers a potential solution to this shortcoming through the use of data driven models which aug…
▽ More
Evolutionary algorithms often struggle to find well converged (e.g small inverted generational distance on test problems) solutions to multi-objective optimization problems on a limited budget of function evaluations (here, a few hundred). The family of surrogate-assisted evolutionary algorithms (SAEAs) offers a potential solution to this shortcoming through the use of data driven models which augment evaluations of the objective functions. A surrogate model which has shown promise in single-objective optimization is to predict the "comparison relationship" between pairs of solutions (i.e. who's objective function is smaller). In this paper, we investigate the performance of this model on multi-objective optimization problems. First, we propose a new algorithm "CRSEA" which uses the comparison-relationship model. Numerical experiments are then performed with the DTLZ and WFG test suites plus a real-world problem from the field of accelerator physics. We find that CRSEA finds better converged solutions than the tested SAEAs on many of the medium-scale, biobjective problems chosen from the WFG suite suggesting the comparison-relationship surrogate as a promising tool for improving the efficiency of multi-objective optimization algorithms.
△ Less
Submitted 29 April, 2025; v1 submitted 27 April, 2025;
originally announced April 2025.
-
WIP: Assessing the Effectiveness of ChatGPT in Preparatory Testing Activities
Authors:
Susmita Haldar,
Mary Pierce,
Luiz Fernando Capretz
Abstract:
This innovative practice WIP paper describes a research study that explores the integration of ChatGPT into the software testing curriculum and evaluates its effectiveness compared to human-generated testing artifacts. In a Capstone Project course, students were tasked with generating preparatory testing artifacts using ChatGPT prompts, which they had previously created manually. Their understandi…
▽ More
This innovative practice WIP paper describes a research study that explores the integration of ChatGPT into the software testing curriculum and evaluates its effectiveness compared to human-generated testing artifacts. In a Capstone Project course, students were tasked with generating preparatory testing artifacts using ChatGPT prompts, which they had previously created manually. Their understanding and the effectiveness of the Artificial Intelligence generated artifacts were assessed through targeted questions. The results, drawn from this in-class assignment at a North American community college indicate that while ChatGPT can automate many testing preparation tasks, it cannot fully replace human expertise. However, students, already familiar with Information Technology at the postgraduate level, found the integration of ChatGPT into their workflow to be straightforward. The study suggests that AI can be gradually introduced into software testing education to keep pace with technological advancements.
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
Factors Influencing Performance of Students in Software Automated Test Tools Course
Authors:
Susmita Haldar,
Mary Pierce,
Luiz Fernando Capretz
Abstract:
Formal software testing education is important for building efficient QA professionals. Various aspects of quality assurance approaches are usually covered in courses for training software testing students. Automated Test Tools is one of the core courses in the software testing post-graduate curriculum due to the high demand for automated testers in the workforce. It is important to understand whi…
▽ More
Formal software testing education is important for building efficient QA professionals. Various aspects of quality assurance approaches are usually covered in courses for training software testing students. Automated Test Tools is one of the core courses in the software testing post-graduate curriculum due to the high demand for automated testers in the workforce. It is important to understand which factors are affecting student performance in the automated testing course to be able to assist the students early on based on their needs. Various metrics that are considered for predicting student performance in this testing course are student engagement, grades on individual deliverables, and prerequisite courses. This study identifies the impact of assessing students based on individual vs. group activities, theoretical vs. practical components, and the effect of having taken prerequisite courses in their final grade. To carry out this research, student data was collected from the automated test tools course of a community college-based postgraduate certificate program in software testing. The dataset contained student records from the years 2021 to 2022 and consisted of information from five different semesters. Various machine learning algorithms were applied to develop an effective model for predicting students performance in the automated software testing tools course, and finally, important features affecting the students performance were identified. The predictive performance model of the automated test tools course that was developed by applying the logistic regression technique, showed the best performance, with an accuracy score of 90%.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Towards Machine Learning-based Fish Stock Assessment
Authors:
Stefan Lüdtke,
Maria E. Pierce
Abstract:
The accurate assessment of fish stocks is crucial for sustainable fisheries management. However, existing statistical stock assessment models can have low forecast performance of relevant stock parameters like recruitment or spawning stock biomass, especially in ecosystems that are changing due to global warming and other anthropogenic stressors. In this paper, we investigate the use of machine le…
▽ More
The accurate assessment of fish stocks is crucial for sustainable fisheries management. However, existing statistical stock assessment models can have low forecast performance of relevant stock parameters like recruitment or spawning stock biomass, especially in ecosystems that are changing due to global warming and other anthropogenic stressors. In this paper, we investigate the use of machine learning models to improve the estimation and forecast of such stock parameters. We propose a hybrid model that combines classical statistical stock assessment models with supervised ML, specifically gradient boosted trees. Our hybrid model leverages the initial estimate provided by the classical model and uses the ML model to make a post-hoc correction to improve accuracy. We experiment with five different stocks and find that the forecast accuracy of recruitment and spawning stock biomass improves considerably in most cases.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
The Science Gateway Community Institute's Consulting Services Program: Lessons for Research Software Engineering Organizations
Authors:
Marlon Pierce,
Michael Zentner,
Maytal Dahan,
Sandra Gesing,
Claire Stirm,
Linda Bailey Hayden
Abstract:
The Science Gateways Community Institute (SGCI) is an NSF Software Infrastructure for Sustained Innovation (S2I2) funded project that leads and supports the science gateway community. Major activities for SGCI include a) sustainability training, including the Focus Week week-long course designed to help science gateway operators develop sustainability plans, and the Jumpstart virtual short-course;…
▽ More
The Science Gateways Community Institute (SGCI) is an NSF Software Infrastructure for Sustained Innovation (S2I2) funded project that leads and supports the science gateway community. Major activities for SGCI include a) sustainability training, including the Focus Week week-long course designed to help science gateway operators develop sustainability plans, and the Jumpstart virtual short-course; b) usability and user experience consulting; c) a community catalog of science gateways and science gateway software; d) workforce development activities, including a coding institute for students, internship opportunities, and hackathons; e) an annual conference; and f) in-depth technical support for client gateway projects. The goals of SGCI's Embedded Technical Support component are to help the institute's clients to create new science gateways or to significantly enhance existing science gateways. Examples of the latter include helping to implement major new capabilities and to implement significant usability improvements suggested by SGCI's usability consultants. The Embedded Technical Support component was managed by Indiana University and involved research software engineers at San Diego Supercomputer Center, Texas Advanced Computing Center, Indiana University, and Purdue University (through 2019). Since 2016, the component has involved 20 research software engineers as consultants and has conducted 59 client consultations. This short paper provides a summary of lessons learned from the Embedded Technical Support program that may be useful for the research software engineering community.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
A Framework to capture and reproduce the Absolute State of Jupyter Notebooks
Authors:
Dimuthu Wannipurage,
Suresh Marru,
Marlon Pierce
Abstract:
Jupyter Notebooks are an enormously popular tool for creating and narrating computational research projects. They also have enormous potential for creating reproducible scientific research artifacts. Capturing the complete state of a notebook has additional benefits; for instance, the notebook execution may be split between local and remote resources, where the latter may have more powerful proces…
▽ More
Jupyter Notebooks are an enormously popular tool for creating and narrating computational research projects. They also have enormous potential for creating reproducible scientific research artifacts. Capturing the complete state of a notebook has additional benefits; for instance, the notebook execution may be split between local and remote resources, where the latter may have more powerful processing capabilities or store large or access-limited data. There are several challenges for making notebooks fully reproducible when examined in detail. The notebook code must be replicated entirely, and the underlying Python runtime environments must be identical. More subtle problems arise in replicating referenced data, external library dependencies, and runtime variable states. This paper presents solutions to these problems using Juptyer's standard extension mechanisms to create an archivable system state for a running notebook. We show that the overhead for these additional mechanisms, which involve interacting with the underlying Linux kernel, does not introduce substantial execution time overheads, demonstrating the approach's feasibility.
△ Less
Submitted 31 March, 2022;
originally announced April 2022.
-
Experiences with managing data parallel computational workflows for High-throughput Fragment Molecular Orbital (FMO) Calculations
Authors:
Dimuthu Wannipurage,
Indrajit Deb,
Eroma Abeysinghe,
Sudhakar Pamidighantam,
Suresh Marru,
Marlon Pierce,
Aaron T. Frank
Abstract:
Fragment Molecular Orbital (FMO) calculations provide a framework to speed up quantum mechanical calculations and so can be used to explore structure-energy relationships in large and complex biomolecular systems. These calculations are still onerous, especially when applied to large sets of molecules. Therefore, cyberinfrastructure that provides mechanisms and user interfaces that manage job subm…
▽ More
Fragment Molecular Orbital (FMO) calculations provide a framework to speed up quantum mechanical calculations and so can be used to explore structure-energy relationships in large and complex biomolecular systems. These calculations are still onerous, especially when applied to large sets of molecules. Therefore, cyberinfrastructure that provides mechanisms and user interfaces that manage job submissions, failed job resubmissions, data retrieval, and data storage for these calculations are needed. Motivated by the need to rapidly identify drugs that are likely to bind to targets implicated in SARS-CoV-2, the virus that causes COVID-19, we developed a static parameter sweeping framework with Apache Airavata middleware to apply to complexes formed between SARS-CoV-2 M-pro (the main protease in SARS-CoV-2) and 2820 small-molecules in a drug-repurposing library. Here we describe the implementation of our framework for managing the executions of the high-throughput FMO calculations. The approach is general and so should find utility in large-scale FMO calculations on biomolecular systems.
△ Less
Submitted 28 January, 2022;
originally announced January 2022.
-
Experiences with Integrating Custos SecurityServices
Authors:
Isuru Ranawaka,
Samitha Liyanage,
Dannon Baker,
Alexandru Mahmoud,
Juleen Graham,
Terry Fleury,
Dimuthu Wannipurage,
Yu Ma,
Enis Afgan,
Jim Basney,
Suresh Marru,
Marlon Pierce
Abstract:
Science gateways are user-facing cyberinfrastruc-ture that provide researchers and educators with Web-basedaccess to scientific software, computing, and data resources.Managing user identities, accounts, and permissions are essentialtasks for science gateways, and gateways likewise must man-age secure connections between their middleware and remoteresources. The Custos project is an effort to buil…
▽ More
Science gateways are user-facing cyberinfrastruc-ture that provide researchers and educators with Web-basedaccess to scientific software, computing, and data resources.Managing user identities, accounts, and permissions are essentialtasks for science gateways, and gateways likewise must man-age secure connections between their middleware and remoteresources. The Custos project is an effort to build open sourcesoftware that can be operated as a multi-tenanted service thatprovides reliable implementations of common science gatewaycybersecurity needs, including federated authentication, iden-tity management, group and authorization management, andresource credential management. Custos aims further to provideintegrated solutions through these capabilities, delivering end-to-end support for several science gateway usage scenarios. Thispaper examines four deployment scenarios using Custos andassociated extensions beyond previously described work. Thefirst capability illustrated by these scenarios is the need forCustos to provide hierarchical tenant management that allowsmultiple gateway deployments to be federated together andalso to support consolidated, hosted science gateway platformservices. The second capability illustrated by these scenarios is theneed to support service accounts that can support non-browserapplications and agent applications that can act on behalf ofusers on edge resources. We illustrate how the latter can be builtusing Web security standards combined with Custos permissionmanagement mechanisms.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
A Multi-Protocol, Secure, and Dynamic Data Storage Integration Frameworkfor Multi-tenanted Science Gateway Middleware
Authors:
Dimuthu Wannipurage,
Isuru Ranawaka,
Eroma Abeysinghe,
Marcus Christie,
Suresh Marru,
Marlon Pierce
Abstract:
Science gateways are user-centric, end-to-end cyberinfrastructure for managing scientific data and executions of computational software on distributed resources. In order to simplify the creation and management of science gateways, we have pursued a multi-tenanted, platform-as-a-service approach that allows multiple gateway front-ends (portals) to be integrated with a consolidated middleware that…
▽ More
Science gateways are user-centric, end-to-end cyberinfrastructure for managing scientific data and executions of computational software on distributed resources. In order to simplify the creation and management of science gateways, we have pursued a multi-tenanted, platform-as-a-service approach that allows multiple gateway front-ends (portals) to be integrated with a consolidated middleware that manages the movement of data and the execution of workflows on multiple back-end scientific computing resources. An important challenge for this approach is to provide an end-to-end data movement and management solution that allows gateway users to integrate their own data stores with the gateway platform. These user-provided data stores may include commercial cloud-based object store systems, third-party data stores accessed through APIs such as REST endpoints, and users' own local storage resources. In this paper, we present a solution design and implementation based on the integration of a managed file transfer (MFT) service (Airavata MFT) into the platform.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Toward Interoperable Cyberinfrastructure: Common Descriptions for Computational Resources and Applications
Authors:
Joe Stubbs,
Suresh Marru,
Daniel Mejia,
Daniel S. Katz,
Kyle Chard,
Maytal Dahan,
Marlon Pierce,
Michael Zentner
Abstract:
The user-facing components of the Cyberinfrastructure (CI) ecosystem, science gateways and scientific workflow systems, share a common need of interfacing with physical resources (storage systems and execution environments) to manage data and execute codes (applications). However, there is no uniform, platform-independent way to describe either the resources or the applications. To address this, w…
▽ More
The user-facing components of the Cyberinfrastructure (CI) ecosystem, science gateways and scientific workflow systems, share a common need of interfacing with physical resources (storage systems and execution environments) to manage data and execute codes (applications). However, there is no uniform, platform-independent way to describe either the resources or the applications. To address this, we propose uniform semantics for describing resources and applications that will be relevant to a diverse set of stakeholders. We sketch a solution to the problem of a common description and catalog of resources: we describe an approach to implementing a resource registry for use by the community and discuss potential approaches to some long-term challenges. We conclude by looking ahead to the application description language.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
MultiCloud Resource Management using Apache Mesos for Planned Integration with Apache Airavata
Authors:
Pankaj Saha,
Madhusudhan Govindaraju,
Suresh Marru,
Marlon Pierce
Abstract:
We discuss initial results and our planned approach for incorporating Apache Mesos based resource management that will enable design and development of scheduling strategies for Apache Airavata jobs so that they can be launched on multiple clouds, wherein several VMs do not have Public IP addresses. We present initial work and next steps on the design of a meta-scheduler using Apache Mesos. Apache…
▽ More
We discuss initial results and our planned approach for incorporating Apache Mesos based resource management that will enable design and development of scheduling strategies for Apache Airavata jobs so that they can be launched on multiple clouds, wherein several VMs do not have Public IP addresses. We present initial work and next steps on the design of a meta-scheduler using Apache Mesos. Apache Mesos presents a unified view of resources available across several clouds and clusters. Our meta-scheduler can potentially examine and identify the cases where multiple small jobs have been submitted by the same scientists and then redirect job from the same community account or user to different clusters. Our approach uses a NAT firewall to make nodes/VMs, without a Public IP, visible to Mesos for the unified view.
△ Less
Submitted 5 October, 2020; v1 submitted 17 June, 2019;
originally announced June 2019.
-
Community Organizations: Changing the Culture in Which Research Software Is Developed and Sustained
Authors:
Daniel S. Katz,
Lois Curfman McInnes,
David E. Bernholdt,
Abigail Cabunoc Mayes,
Neil P. Chue Hong,
Jonah Duckles,
Sandra Gesing,
Michael A. Heroux,
Simon Hettrick,
Rafael C. Jimenez,
Marlon Pierce,
Belinda Weaver,
Nancy Wilkins-Diehr
Abstract:
Software is the key crosscutting technology that enables advances in mathematics, computer science, and domain-specific science and engineering to achieve robust simulations and analysis for science, engineering, and other research fields. However, software itself has not traditionally received focused attention from research communities; rather, software has evolved organically and inconsistently…
▽ More
Software is the key crosscutting technology that enables advances in mathematics, computer science, and domain-specific science and engineering to achieve robust simulations and analysis for science, engineering, and other research fields. However, software itself has not traditionally received focused attention from research communities; rather, software has evolved organically and inconsistently, with its development largely as by-products of other initiatives. Moreover, challenges in scientific software are expanding due to disruptive changes in computer hardware, increasing scale and complexity of data, and demands for more complex simulations involving multiphysics, multiscale modeling and outer-loop analysis. In recent years, community members have established a range of grass-roots organizations and projects to address these growing technical and social challenges in software productivity, quality, reproducibility, and sustainability. This article provides an overview of such groups and discusses opportunities to leverage their synergistic activities while nurturing work toward emerging software ecosystems.
△ Less
Submitted 7 December, 2018; v1 submitted 20 November, 2018;
originally announced November 2018.