-
A novel association and ranking approach identifies factors affecting educational outcomes of STEM majors
Authors:
Kira Adaricheva,
Jonathan T. Brockman,
Gillian Z. Elston,
Lawrence Hobbie,
Skylar Homan,
Mohamad Khalefa,
Jiyun V. Kim,
Rochelle K. Nelson,
Sarah Samad,
Oren Segal
Abstract:
Improving undergraduate success in STEM requires identifying actionable factors that impact student outcomes, allowing institutions to prioritize key leverage points for change. We examined academic, demographic, and institutional factors that might be associated with graduation rates at two four-year colleges in the northeastern United States using a novel association algorithm called D-basis to…
▽ More
Improving undergraduate success in STEM requires identifying actionable factors that impact student outcomes, allowing institutions to prioritize key leverage points for change. We examined academic, demographic, and institutional factors that might be associated with graduation rates at two four-year colleges in the northeastern United States using a novel association algorithm called D-basis to rank attributes associated with graduation. Importantly, the data analyzed included tracking data from the National Student Clearinghouse on students who left their original institutions to determine outcomes following transfer.
Key predictors of successful graduation include performance in introductory STEM courses, the choice of first mathematics class, and flexibility in major selection. High grades in introductory biology, general chemistry, and mathematics courses were strongly correlated with graduation. At the same time, students who switched majors - especially from STEM to non-STEM - had higher overall graduation rates. Additionally, Pell eligibility and demographic factors, though less predictive overall, revealed disparities in time to graduation and retention rates.
The findings highlight the importance of early academic support in STEM gateway courses and the implementation of institutional policies that provide flexibility in major selection. Enhancing student success in introductory mathematics, biology, and chemistry courses could greatly influence graduation rates. Furthermore, customized mathematics pathways and focused support for STEM courses may assist institutions in optimizing student outcomes. This study offers data-driven insights to guide strategies to increase STEM degree completion.
△ Less
Submitted 15 March, 2025;
originally announced March 2025.
-
Pedagogical Design Considerations for Mobile Augmented Reality Serious Games (MARSGs): A Literature Review
Authors:
Cassidy R. Nelson,
Joseph L. Gabbard
Abstract:
As technology advances, conceptualizations of effective strategies for teaching and learning shift. Due in part to their facilitation of unique affordances for learning, mobile devices, augmented reality, and games are all becoming more prominent elements in learning environments. In this work, we examine mobile augmented reality serious games (MARSGs) as the intersection of these technology-based…
▽ More
As technology advances, conceptualizations of effective strategies for teaching and learning shift. Due in part to their facilitation of unique affordances for learning, mobile devices, augmented reality, and games are all becoming more prominent elements in learning environments. In this work, we examine mobile augmented reality serious games (MARSGs) as the intersection of these technology-based experiences and to what effect their combination can yield even greater learning outcomes. We present a PRISMA review of 23 papers (from 610) spanning the entire literature timeline from 2002 to 2023. Among these works, there is wide variability in the realized application of game elements and pedagogical theories underpinning the game experience. For an educational tool to be effective, it must be designed to facilitate learning while anchored by pedagogical theory. Given that most MARSG developers are not pedagogical experts, this review further provides design considerations regarding which game elements might proffer the best of three major pedagogical theories for modern learning (cognitive constructivism, social constructivism, and behaviorism) based on existing applications. We will also briefly touch on radical constructivism and the instructional elements embedded within MARSGs. Lastly, this work offers a synthesis of current MARSG findings and extended future directions for MARSG development.
△ Less
Submitted 15 November, 2024;
originally announced November 2024.
-
LA4SR: illuminating the dark proteome with generative AI
Authors:
David R. Nelson,
Ashish Kumar Jaiswal,
Noha Ismail,
Alexandra Mystikou,
Kourosh Salehi-Ashtiani
Abstract:
AI language models (LMs) show promise for biological sequence analysis. We re-engineered open-source LMs (GPT-2, BLOOM, DistilRoBERTa, ELECTRA, and Mamba, ranging from 70M to 12B parameters) for microbial sequence classification. The models achieved F1 scores up to 95 and operated 16,580x faster and at 2.9x the recall of BLASTP. They effectively classified the algal dark proteome - uncharacterized…
▽ More
AI language models (LMs) show promise for biological sequence analysis. We re-engineered open-source LMs (GPT-2, BLOOM, DistilRoBERTa, ELECTRA, and Mamba, ranging from 70M to 12B parameters) for microbial sequence classification. The models achieved F1 scores up to 95 and operated 16,580x faster and at 2.9x the recall of BLASTP. They effectively classified the algal dark proteome - uncharacterized proteins comprising about 65% of total proteins - validated on new data including a new, complete Hi-C/Pacbio Chlamydomonas genome. Larger (>1B) LA4SR models reached high accuracy (F1 > 86) when trained on less than 2% of available data, rapidly achieving strong generalization capacity. High accuracy was achieved when training data had intact or scrambled terminal information, demonstrating robust generalization to incomplete sequences. Finally, we provide custom AI explainability software tools for attributing amino acid patterns to AI generative processes and interpret their outputs in evolutionary and biophysical contexts.
△ Less
Submitted 11 December, 2024; v1 submitted 11 November, 2024;
originally announced November 2024.
-
Representation Learning of Geometric Trees
Authors:
Zheng Zhang,
Allen Zhang,
Ruth Nelson,
Giorgio Ascoli,
Liang Zhao
Abstract:
Geometric trees are characterized by their tree-structured layout and spatially constrained nodes and edges, which significantly impacts their topological attributes. This inherent hierarchical structure plays a crucial role in domains such as neuron morphology and river geomorphology, but traditional graph representation methods often overlook these specific characteristics of tree structures. To…
▽ More
Geometric trees are characterized by their tree-structured layout and spatially constrained nodes and edges, which significantly impacts their topological attributes. This inherent hierarchical structure plays a crucial role in domains such as neuron morphology and river geomorphology, but traditional graph representation methods often overlook these specific characteristics of tree structures. To address this, we introduce a new representation learning framework tailored for geometric trees. It first features a unique message passing neural network, which is both provably geometrical structure-recoverable and rotation-translation invariant. To address the data label scarcity issue, our approach also includes two innovative training targets that reflect the hierarchical ordering and geometric structure of these geometric trees. This enables fully self-supervised learning without explicit labels. We validate our method's effectiveness on eight real-world datasets, demonstrating its capability to represent geometric trees.
△ Less
Submitted 16 August, 2024;
originally announced August 2024.
-
Automated MPI-X code generation for scalable finite-difference solvers
Authors:
George Bisbas,
Rhodri Nelson,
Mathias Louboutin,
Fabio Luporini,
Paul H. J. Kelly,
Gerard Gorman
Abstract:
Partial differential equations (PDEs) are crucial in modeling diverse phenomena across scientific disciplines, including seismic and medical imaging, computational fluid dynamics, image processing, and neural networks. Solving these PDEs at scale is an intricate and time-intensive process that demands careful tuning. This paper introduces automated code-generation techniques specifically tailored…
▽ More
Partial differential equations (PDEs) are crucial in modeling diverse phenomena across scientific disciplines, including seismic and medical imaging, computational fluid dynamics, image processing, and neural networks. Solving these PDEs at scale is an intricate and time-intensive process that demands careful tuning. This paper introduces automated code-generation techniques specifically tailored for distributed memory parallelism (DMP) to execute explicit finite-difference (FD) stencils at scale, a fundamental challenge in numerous scientific applications. These techniques are implemented and integrated into the Devito DSL and compiler framework, a well-established solution for automating the generation of FD solvers based on a high-level symbolic math input. Users benefit from modeling simulations for real-world applications at a high-level symbolic abstraction and effortlessly harnessing HPC-ready distributed-memory parallelism without altering their source code. This results in drastic reductions both in execution time and developer effort. A comprehensive performance evaluation of Devito's DMP via MPI demonstrates highly competitive strong and weak scaling on CPU and GPU clusters, proving its effectiveness and capability to meet the demands of large-scale scientific simulations.
△ Less
Submitted 6 January, 2025; v1 submitted 20 December, 2023;
originally announced December 2023.
-
A Novel Immersed Boundary Approach for Irregular Topography with Acoustic Wave Equations
Authors:
Edward Caunt,
Rhodri Nelson,
Fabio Luporini,
Gerard Gorman
Abstract:
Irregular terrain has a pronounced effect on the propagation of seismic and acoustic wavefields but is not straightforwardly reconciled with structured finite-difference (FD) methods used to model such phenomena. Methods currently detailed in the literature are generally limited in scope application-wise or non-trivial to apply to real-world geometries. With this in mind, a general immersed bounda…
▽ More
Irregular terrain has a pronounced effect on the propagation of seismic and acoustic wavefields but is not straightforwardly reconciled with structured finite-difference (FD) methods used to model such phenomena. Methods currently detailed in the literature are generally limited in scope application-wise or non-trivial to apply to real-world geometries. With this in mind, a general immersed boundary treatment capable of imposing a range of boundary conditions in a relatively equation-agnostic manner has been developed, alongside a framework implementing this approach, intending to complement emerging code-generation paradigms. The approach is distinguished by the use of N-dimensional Taylor-series extrapolants constrained by boundary conditions imposed at some suitably-distributed set of surface points. The extrapolation process is encapsulated in modified derivative stencils applied in the vicinity of the boundary, utilizing hyperspherical support regions. This method ensures boundary representation is consistent with the FD discretization: both must be considered in tandem. Furthermore, high-dimensional and vector boundary conditions can be applied without approximation prior to discretization. A consistent methodology can thus be applied across free and rigid surfaces with the first and second-order acoustic wave equation formulations. Application to both equations is demonstrated, and numerical examples based on analytic and real-world topography implementing free and rigid surfaces in 2D and 3D are presented.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
Data sharing and ontology use among agricultural genetics, genomics, and breeding databases and resources of the AgBioData Consortium
Authors:
Jennifer L. Clarke,
Laurel D. Cooper,
Monica F. Poelchau,
Tanya Z. Berardini,
Justin Elser,
Andrew D. Farmer,
Stephen Ficklin,
Sunita Kumari,
Marie-Angélique Laporte,
Rex T. Nelson,
Rie Sadohara,
Peter Selby,
Anne E. Thessen,
Brandon Whitehead,
Taner Z. Sen
Abstract:
Over the last several decades, there has been rapid growth in the number and scope of agricultural genetics, genomics and breeding (GGB) databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as 'databa…
▽ More
Over the last several decades, there has been rapid growth in the number and scope of agricultural genetics, genomics and breeding (GGB) databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as 'databases' throughout). One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of datasets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups, focused on Data Sharing and Ontologies, conducted a survey to assess the status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data sharing practices by AgBioData databases are in a healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that ontology use has not substantially changed since a similar survey was conducted in 2017. We recommend 1) providing training for database personnel in specific data sharing techniques, as well as in ontology use; 2) further study on what metadata is shared, and how well it is shared among databases; 3) promoting an understanding of data sharing and ontologies in the stakeholder community; 4) improving data sharing and ontologies for specific phenotypic data types and formats; and 5) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and the identification, promotion, or development of data standards. Combined, these improvements are likely to help AgBioData databases increase development efforts towards improved ontology use, and data sharing via programmatic means.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Temporal blocking of finite-difference stencil operators with sparse "off-the-grid" sources
Authors:
George Bisbas,
Fabio Luporini,
Mathias Louboutin,
Rhodri Nelson,
Gerard Gorman,
Paul H. J. Kelly
Abstract:
Stencil kernels dominate a range of scientific applications, including seismic and medical imaging, image processing, and neural networks. Temporal blocking is a performance optimization that aims to reduce the required memory bandwidth of stencil computations by re-using data from the cache for multiple time steps. It has already been shown to be beneficial for this class of algorithms. However,…
▽ More
Stencil kernels dominate a range of scientific applications, including seismic and medical imaging, image processing, and neural networks. Temporal blocking is a performance optimization that aims to reduce the required memory bandwidth of stencil computations by re-using data from the cache for multiple time steps. It has already been shown to be beneficial for this class of algorithms. However, applying temporal blocking to practical applications' stencils remains challenging. These computations often consist of sparsely located operators not aligned with the computational grid ("off-the-grid"). Our work is motivated by modeling problems in which source injections result in wavefields that must then be measured at receivers by interpolation from the grided wavefield. The resulting data dependencies make the adoption of temporal blocking much more challenging. We propose a methodology to inspect these data dependencies and reorder the computation, leading to performance gains in stencil codes where temporal blocking has not been applicable. We implement this novel scheme in the Devito domain-specific compiler toolchain. Devito implements a domain-specific language embedded in Python to generate optimized partial differential equation solvers using the finite-difference method from high-level symbolic problem definitions. We evaluate our scheme using isotropic acoustic, anisotropic acoustic, and isotropic elastic wave propagators of industrial significance. After auto-tuning, performance evaluation shows that this enables substantial performance improvement through temporal blocking over highly-optimized vectorized spatially-blocked code of up to 1.6x.
△ Less
Submitted 25 February, 2021; v1 submitted 20 October, 2020;
originally announced October 2020.
-
Visualizing a Large Spatiotemporal Collection of Historic Photography with a Generous Interface
Authors:
Taylor Arnold,
Nathaniel Ayers,
Justin Madron,
Robert Nelson,
Lauren Tilton
Abstract:
Museums, libraries, and other cultural institutions continue to prioritize and build web-based visualization systems that increase access and discovery to digitized archives. Prominent examples exist that illustrate impressive visualizations of a particular feature of a collection. For example, interactive maps showing geographic spread or timelines capturing the temporal aspects of collections. B…
▽ More
Museums, libraries, and other cultural institutions continue to prioritize and build web-based visualization systems that increase access and discovery to digitized archives. Prominent examples exist that illustrate impressive visualizations of a particular feature of a collection. For example, interactive maps showing geographic spread or timelines capturing the temporal aspects of collections. By way of a case study, this paper presents a new web-based visualization system that allows users to simultaneously explore a large collection of images along several different dimensions---spatial, temporal, visual, textual, and through additional metadata fields including the photographer name---guided by the concept of generous interfaces. The case study is a complete redesign of a previously released digital, public humanities project called Photogrammar (2014). The paper highlights the redesign's interactive visualizations that are now possible by the affordances of newly available software. All of the code is open-source in order to allow for re-use of the codebase to other collections with a similar structure.
△ Less
Submitted 4 September, 2020;
originally announced September 2020.
-
Scaling through abstractions -- high-performance vectorial wave simulations for seismic inversion with Devito
Authors:
Mathias Louboutin,
Fabio Luporini,
Philipp Witte,
Rhodri Nelson,
George Bisbas,
Jan Thorbecke,
Felix J. Herrmann,
Gerard Gorman
Abstract:
[Devito] is an open-source Python project based on domain-specific language and compiler technology. Driven by the requirements of rapid HPC applications development in exploration seismology, the language and compiler have evolved significantly since inception. Sophisticated boundary conditions, tensor contractions, sparse operations and features such as staggered grids and sub-domains are all su…
▽ More
[Devito] is an open-source Python project based on domain-specific language and compiler technology. Driven by the requirements of rapid HPC applications development in exploration seismology, the language and compiler have evolved significantly since inception. Sophisticated boundary conditions, tensor contractions, sparse operations and features such as staggered grids and sub-domains are all supported; operators of essentially arbitrary complexity can be generated. To accommodate this flexibility whilst ensuring performance, data dependency analysis is utilized to schedule loops and detect computational-properties such as parallelism. In this article, the generation and simulation of MPI-parallel propagators (along with their adjoints) for the pseudo-acoustic wave-equation in tilted transverse isotropic media and the elastic wave-equation are presented. Simulations are carried out on industry scale synthetic models in a HPC Cloud system and reach a performance of 28TFLOP/s, hence demonstrating Devito's suitability for production-grade seismic inversion problems.
△ Less
Submitted 22 April, 2020;
originally announced April 2020.
-
Automated Weed Detection in Aerial Imagery with Context
Authors:
Delia Bullock,
Andrew Mangeni,
Tyr Wiesner-Hanks,
Chad DeChant,
Ethan L. Stewart,
Nicholas Kaczmar,
Judith M. Kolkman,
Rebecca J. Nelson,
Michael A. Gore,
Hod Lipson
Abstract:
In this paper, we demonstrate the ability to discriminate between cultivated maize plant and grass or grass-like weed image segments using the context surrounding the image segments. While convolutional neural networks have brought state of the art accuracies within object detection, errors arise when objects in different classes share similar features. This scenario often occurs when objects in i…
▽ More
In this paper, we demonstrate the ability to discriminate between cultivated maize plant and grass or grass-like weed image segments using the context surrounding the image segments. While convolutional neural networks have brought state of the art accuracies within object detection, errors arise when objects in different classes share similar features. This scenario often occurs when objects in images are viewed at too small of a scale to discern distinct differences in features, causing images to be incorrectly classified or localized. To solve this problem, we will explore using context when classifying image segments. This technique involves feeding a convolutional neural network a central square image along with a border of its direct surroundings at train and test times. This means that although images are labelled at a smaller scale to preserve accurate localization, the network classifies the images and learns features that include the wider context. We demonstrate the benefits of this context technique in the object detection task through a case study of grass (foxtail) and grass-like (yellow nutsedge) weed detection in maize fields. In this standard situation, adding context alone nearly halved the error of the neural network from 7.1% to 4.3%. After only one epoch with context, the network also achieved a higher accuracy than the network without context did after 50 epochs. The benefits of using the context technique are likely to particularly evident in agricultural contexts in which parts (such as leaves) of several plants may appear similar when not taking into account the context in which those parts appear.
△ Less
Submitted 19 November, 2019; v1 submitted 1 October, 2019;
originally announced October 2019.
-
SciPy 1.0--Fundamental Algorithms for Scientific Computing in Python
Authors:
Pauli Virtanen,
Ralf Gommers,
Travis E. Oliphant,
Matt Haberland,
Tyler Reddy,
David Cournapeau,
Evgeni Burovski,
Pearu Peterson,
Warren Weckesser,
Jonathan Bright,
Stéfan J. van der Walt,
Matthew Brett,
Joshua Wilson,
K. Jarrod Millman,
Nikolay Mayorov,
Andrew R. J. Nelson,
Eric Jones,
Robert Kern,
Eric Larson,
CJ Carey,
İlhan Polat,
Yu Feng,
Eric W. Moore,
Jake VanderPlas,
Denis Laxalde
, et al. (10 additional authors not shown)
Abstract:
SciPy is an open source scientific computing library for the Python programming language. SciPy 1.0 was released in late 2017, about 16 years after the original version 0.1 release. SciPy has become a de facto standard for leveraging scientific algorithms in the Python programming language, with more than 600 unique code contributors, thousands of dependent packages, over 100,000 dependent reposit…
▽ More
SciPy is an open source scientific computing library for the Python programming language. SciPy 1.0 was released in late 2017, about 16 years after the original version 0.1 release. SciPy has become a de facto standard for leveraging scientific algorithms in the Python programming language, with more than 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories, and millions of downloads per year. This includes usage of SciPy in almost half of all machine learning projects on GitHub, and usage by high profile projects including LIGO gravitational wave analysis and creation of the first-ever image of a black hole (M87). The library includes functionality spanning clustering, Fourier transforms, integration, interpolation, file I/O, linear algebra, image processing, orthogonal distance regression, minimization algorithms, signal processing, sparse matrix handling, computational geometry, and statistics. In this work, we provide an overview of the capabilities and development practices of the SciPy library and highlight some recent technical developments.
△ Less
Submitted 23 July, 2019;
originally announced July 2019.
-
Information Fusion to Estimate Resilience of Dense Urban Neighborhoods
Authors:
Anthony Palladino,
Elisa J. Bienenstock,
Bradley M. West,
Jake R. Nelson,
Tony H. Grubesic
Abstract:
Diverse sociocultural influences in rapidly growing dense urban areas may induce strain on civil services and reduce the resilience of those areas to exogenous and endogenous shocks. We present a novel approach with foundations in computer and social sciences, to estimate the resilience of dense urban areas at finer spatiotemporal scales compared to the state-of-the-art. We fuse multi-modal data s…
▽ More
Diverse sociocultural influences in rapidly growing dense urban areas may induce strain on civil services and reduce the resilience of those areas to exogenous and endogenous shocks. We present a novel approach with foundations in computer and social sciences, to estimate the resilience of dense urban areas at finer spatiotemporal scales compared to the state-of-the-art. We fuse multi-modal data sources to estimate resilience indicators from social science theory and leverage a structured ontology for factor combinations to enhance explainability. Estimates of destabilizing areas can improve the decision-making capabilities of civil governments by identifying critical areas needing increased social services.
△ Less
Submitted 27 March, 2019;
originally announced March 2019.
-
SNS Timing System
Authors:
B. oerter,
R. Nelson,
T. Shea,
C. Sibley
Abstract:
This poster describes the timing system being designed for Spallation Neutron Source being built at Oak Ridge National lab.
This poster describes the timing system being designed for Spallation Neutron Source being built at Oak Ridge National lab.
△ Less
Submitted 9 November, 2001;
originally announced November 2001.