Skip to main content

Showing 1–50 of 157 results for author: Foster, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.05428  [pdf, ps, other

    cs.MA cs.DC

    Empowering Scientific Workflows with Federated Agents

    Authors: J. Gregory Pauloski, Yadu Babuji, Ryan Chard, Mansi Sakarvadia, Kyle Chard, Ian Foster

    Abstract: Agentic systems, in which diverse agents cooperate to tackle challenging problems, are exploding in popularity in the AI community. However, the agentic frameworks used to build these systems have not previously enabled use with research cyberinfrastructure. Here we introduce Academy, a modular and extensible middleware designed to deploy autonomous agents across the federated research ecosystem,… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  2. arXiv:2505.04846  [pdf, ps, other

    cs.IR cs.CE cs.CL cs.DC cs.LG

    HiPerRAG: High-Performance Retrieval Augmented Generation for Scientific Insights

    Authors: Ozan Gokdemir, Carlo Siebenschuh, Alexander Brace, Azton Wells, Brian Hsu, Kyle Hippe, Priyanka V. Setty, Aswathy Ajith, J. Gregory Pauloski, Varuni Sastry, Sam Foreman, Huihuo Zheng, Heng Ma, Bharat Kale, Nicholas Chia, Thomas Gibbs, Michael E. Papka, Thomas Brettin, Francis J. Alexander, Anima Anandkumar, Ian Foster, Rick Stevens, Venkatram Vishwanath, Arvind Ramanathan

    Abstract: The volume of scientific literature is growing exponentially, leading to underutilized discoveries, duplicated efforts, and limited cross-disciplinary collaboration. Retrieval Augmented Generation (RAG) offers a way to assist scientists by improving the factuality of Large Language Models (LLMs) in processing this influx of information. However, scaling RAG to handle millions of articles introduce… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: This paper has been accepted at the Platform for Advanced Scientific Computing Conference (PASC 25), June 16-18, 2025, Brugg-Windisch, Switzerland

    ACM Class: H.3.3; I.2.7

  3. arXiv:2505.03049  [pdf, ps, other

    cs.LG cond-mat.mtrl-sci

    34 Examples of LLM Applications in Materials Science and Chemistry: Towards Automation, Assistants, Agents, and Accelerated Scientific Discovery

    Authors: Yoel Zimmermann, Adib Bazgir, Alexander Al-Feghali, Mehrad Ansari, L. Catherine Brinson, Yuan Chiang, Defne Circi, Min-Hsueh Chiu, Nathan Daelman, Matthew L. Evans, Abhijeet S. Gangan, Janine George, Hassan Harb, Ghazal Khalighinejad, Sartaaj Takrim Khan, Sascha Klawohn, Magdalena Lederbauer, Soroush Mahjoubi, Bernadette Mohr, Seyed Mohamad Moosavi, Aakash Naik, Aleyna Beste Ozhan, Dieter Plessers, Aritra Roy, Fabian Schöppach , et al. (8 additional authors not shown)

    Abstract: Large Language Models (LLMs) are reshaping many aspects of materials science and chemistry research, enabling advances in molecular property prediction, materials design, scientific automation, knowledge extraction, and more. Recent developments demonstrate that the latest class of models are able to integrate structured and unstructured data, assist in hypothesis generation, and streamline resear… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2411.15221

  4. arXiv:2505.01435  [pdf, other

    cs.IR cs.CL cs.DC cs.LG

    AdaParse: An Adaptive Parallel PDF Parsing and Resource Scaling Engine

    Authors: Carlo Siebenschuh, Kyle Hippe, Ozan Gokdemir, Alexander Brace, Arham Khan, Khalid Hossain, Yadu Babuji, Nicholas Chia, Venkatram Vishwanath, Rick Stevens, Arvind Ramanathan, Ian Foster, Robert Underwood

    Abstract: Language models for scientific tasks are trained on text from scientific publications, most distributed as PDFs that require parsing. PDF parsing approaches range from inexpensive heuristics (for simple documents) to computationally intensive ML-driven systems (for complex or degraded ones). The choice of the "best" parser for a particular document depends on its computational cost and the accurac… ▽ More

    Submitted 23 April, 2025; originally announced May 2025.

    Comments: This paper has been accepted at the The Eighth Annual Conference on Machine Learning and Systems (MLSys 2025)

  5. arXiv:2504.01990  [pdf, other

    cs.AI

    Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

    Authors: Bang Liu, Xinfeng Li, Jiayi Zhang, Jinlin Wang, Tanjin He, Sirui Hong, Hongzhang Liu, Shaokun Zhang, Kaitao Song, Kunlun Zhu, Yuheng Cheng, Suyuchen Wang, Xiaoqiang Wang, Yuyu Luo, Haibo Jin, Peiyan Zhang, Ollie Liu, Jiaqi Chen, Huan Zhang, Zhaoyang Yu, Haochen Shi, Boyan Li, Dekun Wu, Fengwei Teng, Xiaojun Jia , et al. (22 additional authors not shown)

    Abstract: The advent of large language models (LLMs) has catalyzed a transformative shift in artificial intelligence, paving the way for advanced intelligent agents capable of sophisticated reasoning, robust perception, and versatile action across diverse domains. As these agents increasingly drive AI research and practical applications, their design, evaluation, and continuous improvement present intricate… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

  6. Globus Service Enhancements for Exascale Applications and Facilities

    Authors: Weijian Zheng, Jack Kordas, Tyler J. Skluzacek, Raj Kettimuthu, Ian Foster

    Abstract: Many extreme-scale applications require the movement of large quantities of data to, from, and among leadership computing facilities, as well as other scientific facilities and the home institutions of facility users. These applications, particularly when leadership computing facilities are involved, can touch upon edge cases (e.g., terabyte files) that had not been a focus of previous Globus opti… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

  7. arXiv:2503.12752  [pdf, other

    cs.DC

    WRATH: Workload Resilience Across Task Hierarchies in Task-based Parallel Programming Frameworks

    Authors: Sicheng Zhou, Zhuozhao Li, Valérie Hayot-Sasson, Haochen Pan, Maxime Gonthier, J. Gregory Pauloski, Ryan Chard, Kyle Chard, Ian Foster

    Abstract: Failures in Task-based Parallel Programming (TBPP) can severely degrade performance and result in incomplete or incorrect outcomes. Existing failure-handling approaches, including reactive, proactive, and resilient methods such as retry and checkpointing mechanisms, often apply uniform retry mechanisms regardless of the root cause of failures, failing to account for the unique characteristics of T… ▽ More

    Submitted 27 March, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

    Comments: Preprint version

  8. arXiv:2502.20309  [pdf, other

    cs.AI

    EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants

    Authors: Franck Cappello, Sandeep Madireddy, Robert Underwood, Neil Getty, Nicholas Lee-Ping Chia, Nesar Ramachandra, Josh Nguyen, Murat Keceli, Tanwi Mallick, Zilinghan Li, Marieme Ngom, Chenhui Zhang, Angel Yanguas-Gil, Evan Antoniuk, Bhavya Kailkhura, Minyang Tian, Yufeng Du, Yuan-Sen Ting, Azton Wells, Bogdan Nicolae, Avinash Maurya, M. Mustafa Rafique, Eliu Huerta, Bo Li, Ian Foster , et al. (1 additional authors not shown)

    Abstract: Recent advancements have positioned AI, and particularly Large Language Models (LLMs), as transformative tools for scientific research, capable of addressing complex tasks that require reasoning, problem-solving, and decision-making. Their exceptional capabilities suggest their potential as scientific research assistants but also highlight the need for holistic, rigorous, and domain-specific evalu… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: 33 pages, 18 figures

  9. arXiv:2502.12280  [pdf, other

    cs.DC cs.AI

    Connecting Large Language Model Agent to High Performance Computing Resource

    Authors: Heng Ma, Alexander Brace, Carlo Siebenschuh, Greg Pauloski, Ian Foster, Arvind Ramanathan

    Abstract: The Large Language Model agent workflow enables the LLM to invoke tool functions to increase the performance on specific scientific domain questions. To tackle large scale of scientific research, it requires access to computing resource and parallel computing setup. In this work, we implemented Parsl to the LangChain/LangGraph tool call setup, to bridge the gap between the LLM agent to the computi… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: 7 pages, 4 figures

    ACM Class: I.2.11

  10. arXiv:2502.07237  [pdf, other

    cs.LG cs.CL q-bio.BM stat.ML

    DrugImproverGPT: A Large Language Model for Drug Optimization with Fine-Tuning via Structured Policy Optimization

    Authors: Xuefeng Liu, Songhao Jiang, Siyu Chen, Zhuoran Yang, Yuxin Chen, Ian Foster, Rick Stevens

    Abstract: Finetuning a Large Language Model (LLM) is crucial for generating results towards specific objectives. This research delves into the realm of drug optimization and introduce a novel reinforcement learning algorithm to finetune a drug optimization LLM-based generative model, enhancing the original drug across target objectives, while retains the beneficial chemical properties of the original drug.… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  11. arXiv:2502.06891  [pdf, other

    q-bio.BM cs.CL cs.LG

    ScaffoldGPT: A Scaffold-based GPT Model for Drug Optimization

    Authors: Xuefeng Liu, Songhao Jiang, Ian Foster, Jinbo Xu, Rick Stevens

    Abstract: Drug optimization has become increasingly crucial in light of fast-mutating virus strains and drug-resistant cancer cells. Nevertheless, it remains challenging as it necessitates retaining the beneficial properties of the original drug while simultaneously enhancing desired attributes beyond its scope. In this work, we aim to tackle this challenge by introducing ScaffoldGPT, a novel Generative Pre… ▽ More

    Submitted 11 April, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

  12. arXiv:2502.05293  [pdf, other

    cs.DC

    Optimizing Fine-Grained Parallelism Through Dynamic Load Balancing on Multi-Socket Many-Core Systems

    Authors: Wenyi Wang, Maxime Gonthier, Poornima Nookala, Haochen Pan, Ian Foster, Ioan Raicu, Kyle Chard

    Abstract: Achieving efficient task parallelism on many-core architectures is an important challenge. The widely used GNU OpenMP implementation of the popular OpenMP parallel programming model incurs high overhead for fine-grained, short-running tasks due to time spent on runtime synchronization. In this work, we introduce and analyze three key advances that collectively achieve significant performance gains… ▽ More

    Submitted 19 March, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

    Comments: 13 pages, 11 figures, camera-ready, accepted by IPDPS2025

    ACM Class: D.1.3

  13. arXiv:2501.10651  [pdf, other

    cs.DC cond-mat.mtrl-sci cs.LG

    MOFA: Discovering Materials for Carbon Capture with a GenAI- and Simulation-Based Workflow

    Authors: Xiaoli Yan, Nathaniel Hudson, Hyun Park, Daniel Grzenda, J. Gregory Pauloski, Marcus Schwarting, Haochen Pan, Hassan Harb, Samuel Foreman, Chris Knight, Tom Gibbs, Kyle Chard, Santanu Chaudhuri, Emad Tajkhorshid, Ian Foster, Mohamad Moosavi, Logan Ward, E. A. Huerta

    Abstract: We present MOFA, an open-source generative AI (GenAI) plus simulation workflow for high-throughput generation of metal-organic frameworks (MOFs) on large-scale high-performance computing (HPC) systems. MOFA addresses key challenges in integrating GPU-accelerated computing for GPU-intensive GenAI tasks, including distributed training and inference, alongside CPU- and GPU-optimized tasks for screeni… ▽ More

    Submitted 17 January, 2025; originally announced January 2025.

    Comments: 13 pages, 10 figures

  14. arXiv:2501.09557  [pdf, other

    cs.DC

    Core Hours and Carbon Credits: Incentivizing Sustainability in HPC

    Authors: Alok Kamatar, Maxime Gonthier, Valerie Hayot-Sasson, Andre Bauer, Marcin Copik, Torsten Hoefler, Raul Castro Fernandez, Kyle Chard, Ian Foster

    Abstract: Realizing a shared responsibility between providers and consumers is critical to manage the sustainability of HPC. However, while cost may motivate efficiency improvements by infrastructure operators, broader progress is impeded by a lack of user incentives. We conduct a survey of HPC users that reveals fewer than 30 percent are aware of their energy consumption, and that energy efficiency is amon… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

  15. arXiv:2501.01316  [pdf

    cs.DC

    Computational Grids

    Authors: Ian Foster, Carl Kesselman

    Abstract: In this introductory chapter, we lay the groundwork for the rest of the book by providing a more detailed picture of the expected purpose, shape, and architecture of future grid systems. We structure the chapter in terms of six questions that we believe are central to this discussion: Why do we need computational grids? What types of applications will grids be used for? Who will use grids? How wil… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

    Comments: Chapter 2 of the book, "The Grid: Blueprint for a New Computing Infrastructure", Elsevier, 1998

  16. arXiv:2411.15221  [pdf, other

    cs.LG cond-mat.mtrl-sci physics.chem-ph

    Reflections from the 2024 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry

    Authors: Yoel Zimmermann, Adib Bazgir, Zartashia Afzal, Fariha Agbere, Qianxiang Ai, Nawaf Alampara, Alexander Al-Feghali, Mehrad Ansari, Dmytro Antypov, Amro Aswad, Jiaru Bai, Viktoriia Baibakova, Devi Dutta Biswajeet, Erik Bitzek, Joshua D. Bocarsly, Anna Borisova, Andres M Bran, L. Catherine Brinson, Marcel Moran Calderon, Alessandro Canalicchio, Victor Chen, Yuan Chiang, Defne Circi, Benjamin Charmes, Vikrant Chaudhary , et al. (119 additional authors not shown)

    Abstract: Here, we present the outcomes from the second Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry, which engaged participants across global hybrid locations, resulting in 34 team submissions. The submissions spanned seven key application areas and demonstrated the diverse utility of LLMs for applications in (1) molecular and material property prediction; (2) mo… ▽ More

    Submitted 2 January, 2025; v1 submitted 20 November, 2024; originally announced November 2024.

    Comments: Updating author information, the submission remains largely unchanged. 98 pages total

  17. arXiv:2411.04257  [pdf, other

    cs.LG

    LSHBloom: Memory-efficient, Extreme-scale Document Deduplication

    Authors: Arham Khan, Robert Underwood, Carlo Siebenschuh, Yadu Babuji, Aswathy Ajith, Kyle Hippe, Ozan Gokdemir, Alexander Brace, Kyle Chard, Ian Foster

    Abstract: Deduplication is a major focus for assembling and curating training datasets for large language models (LLM) -- detecting and eliminating additional instances of the same content -- in large collections of technical documents. Unrestrained, duplicates in the training dataset increase training costs and lead to undesirable properties such as memorization in trained models or cheating on evaluation.… ▽ More

    Submitted 12 May, 2025; v1 submitted 6 November, 2024; originally announced November 2024.

  18. Workflows Community Summit 2024: Future Trends and Challenges in Scientific Workflows

    Authors: Rafael Ferreira da Silva, Deborah Bard, Kyle Chard, Shaun de Witt, Ian T. Foster, Tom Gibbs, Carole Goble, William Godoy, Johan Gustafsson, Utz-Uwe Haus, Stephen Hudson, Shantenu Jha, Laila Los, Drew Paine, Frédéric Suter, Logan Ward, Sean Wilkinson, Marcos Amaris, Yadu Babuji, Jonathan Bader, Riccardo Balin, Daniel Balouek, Sarah Beecroft, Khalid Belhajjame, Rajat Bhattarai , et al. (86 additional authors not shown)

    Abstract: The Workflows Community Summit gathered 111 participants from 18 countries to discuss emerging trends and challenges in scientific workflows, focusing on six key areas: time-sensitive workflows, AI-HPC convergence, multi-facility workflows, heterogeneous HPC environments, user experience, and FAIR computational workflows. The integration of AI and exascale computing has revolutionized scientific w… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Report number: ORNL/TM-2024/3573

  19. arXiv:2410.12927  [pdf, other

    cs.LG cs.AI

    Deep Model Merging: The Sister of Neural Network Interpretability -- A Survey

    Authors: Arham Khan, Todd Nief, Nathaniel Hudson, Mansi Sakarvadia, Daniel Grzenda, Aswathy Ajith, Jordan Pettyjohn, Kyle Chard, Ian Foster

    Abstract: We survey the model merging literature through the lens of loss landscape geometry to connect observations from empirical studies on model merging and loss landscape analysis to phenomena that govern neural network training and the emergence of their inner representations. We distill repeated empirical observations from the literature in these fields into descriptions of four major characteristics… ▽ More

    Submitted 21 March, 2025; v1 submitted 16 October, 2024; originally announced October 2024.

  20. arXiv:2410.12092  [pdf, other

    cs.DC

    Accelerating Python Applications with Dask and ProxyStore

    Authors: J. Gregory Pauloski, Klaudiusz Rydzy, Valerie Hayot-Sasson, Ian Foster, Kyle Chard

    Abstract: Applications are increasingly written as dynamic workflows underpinned by an execution framework that manages asynchronous computations across distributed hardware. However, execution frameworks typically offer one-size-fits-all solutions for data flow management, which can restrict performance and scalability. ProxyStore, a middleware layer that optimizes data flow via an advanced pass-by-referen… ▽ More

    Submitted 17 October, 2024; v1 submitted 15 October, 2024; originally announced October 2024.

    Comments: To be presented as a demo at the SC24 Workshop on High Performance Python for Science at Scale (HPPSS)

  21. arXiv:2410.02159  [pdf, other

    cs.LG cs.AI cs.CL

    Mitigating Memorization In Language Models

    Authors: Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Nathaniel Hudson, Caleb Geniesse, Kyle Chard, Yaoqing Yang, Ian Foster, Michael W. Mahoney

    Abstract: Language models (LMs) can "memorize" information, i.e., encode training data in their weights in such a way that inference-time queries can lead to verbatim regurgitation of that data. This ability to extract training data can be problematic, for example, when data are private or sensitive. In this work, we investigate methods to mitigate memorization: three regularizer-based, three finetuning-bas… ▽ More

    Submitted 28 January, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

  22. arXiv:2410.00709  [pdf, other

    q-bio.QM cs.AI stat.ML

    Binding Affinity Prediction: From Conventional to Machine Learning-Based Approaches

    Authors: Xuefeng Liu, Songhao Jiang, Xiaotian Duan, Archit Vasan, Chong Liu, Chih-chan Tien, Heng Ma, Thomas Brettin, Fangfang Xia, Ian T. Foster, Rick L. Stevens

    Abstract: Protein-ligand binding is the process by which a small molecule (drug or inhibitor) attaches to a target protein. The binding affinity, which refers to the strength of this interaction, is central to many important problems in bioinformatics such as drug design. An extensive amount of work has been devoted to predicting binding affinity over the past decades due to its significance. In this paper,… ▽ More

    Submitted 29 September, 2024; originally announced October 2024.

  23. arXiv:2409.16495  [pdf, other

    cs.LG cs.DC

    Flight: A FaaS-Based Framework for Complex and Hierarchical Federated Learning

    Authors: Nathaniel Hudson, Valerie Hayot-Sasson, Yadu Babuji, Matt Baughman, J. Gregory Pauloski, Ryan Chard, Ian Foster, Kyle Chard

    Abstract: Federated Learning (FL) is a decentralized machine learning paradigm where models are trained on distributed devices and are aggregated at a central server. Existing FL frameworks assume simple two-tier network topologies where end devices are directly connected to the aggregation server. While this is a practical mental model, it does not exploit the inherent topology of real-world distributed sy… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

  24. arXiv:2408.14627  [pdf, ps, other

    cs.DB cs.CE cs.CY cs.ET

    Sustainable Data Democratization: A Multifaceted Investment for an Equitable Future

    Authors: Michela Taufer, Valerio Pascucci, Christine R. Kirkpatric, Ian T. Foster

    Abstract: The urgent need for data democratization in scientific research was the focal point of a panel discussion at SC23 in Denver, Colorado, from November 12 to 17, 2023. This article summarizes the outcomes of that discussion and subsequent conversations. We advocate for strategic investments in financial, human, and technological resources for sustainable data democratization. Emphasizing that data is… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 5 pages

  25. arXiv:2408.14434  [pdf, other

    cs.DC cs.LG

    Employing Artificial Intelligence to Steer Exascale Workflows with Colmena

    Authors: Logan Ward, J. Gregory Pauloski, Valerie Hayot-Sasson, Yadu Babuji, Alexander Brace, Ryan Chard, Kyle Chard, Rajeev Thakur, Ian Foster

    Abstract: Computational workflows are a common class of application on supercomputers, yet the loosely coupled and heterogeneous nature of workflows often fails to take full advantage of their capabilities. We created Colmena to leverage the massive parallelism of a supercomputer by using Artificial Intelligence (AI) to learn from and adapt a workflow as it executes. Colmena allows scientists to define how… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  26. arXiv:2408.07236  [pdf, other

    cs.DC

    TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks

    Authors: J. Gregory Pauloski, Valerie Hayot-Sasson, Maxime Gonthier, Nathaniel Hudson, Haochen Pan, Sicheng Zhou, Ian Foster, Kyle Chard

    Abstract: Task-based execution frameworks, such as parallel programming libraries, computational workflow systems, and function-as-a-service platforms, enable the composition of distinct tasks into a single, unified application designed to achieve a computational goal. Task-based execution frameworks abstract the parallel execution of an application's tasks on arbitrary hardware. Research into these task ex… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: To appear in the Proceedings of 20th IEEE International Conference on e-Science

  27. arXiv:2407.11432  [pdf, other

    cs.DC

    Octopus: Experiences with a Hybrid Event-Driven Architecture for Distributed Scientific Computing

    Authors: Haochen Pan, Ryan Chard, Sicheng Zhou, Alok Kamatar, Rafael Vescovi, Valérie Hayot-Sasson, André Bauer, Maxime Gonthier, Kyle Chard, Ian Foster

    Abstract: Scientific research increasingly relies on distributed computational resources, storage systems, networks, and instruments, ranging from HPC and cloud systems to edge devices. Event-driven architecture (EDA) benefits applications targeting distributed research infrastructures by enabling the organization, communication, processing, reliability, and security of events generated from many sources. T… ▽ More

    Submitted 28 September, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 12 pages and 8 figures. Camera-ready version for FTXS'24 (https://sites.google.com/view/ftxs2024)

  28. arXiv:2407.09434  [pdf, other

    cs.LG cs.AI cs.CE eess.SY

    Foundation Models for the Electric Power Grid

    Authors: Hendrik F. Hamann, Thomas Brunschwiler, Blazhe Gjorgiev, Leonardo S. A. Martins, Alban Puech, Anna Varbella, Jonas Weiss, Juan Bernabe-Moreno, Alexandre Blondin Massé, Seong Choi, Ian Foster, Bri-Mathias Hodge, Rishabh Jain, Kibaek Kim, Vincent Mai, François Mirallès, Martin De Montigny, Octavio Ramos-Leaños, Hussein Suprême, Le Xie, El-Nasser S. Youssef, Arnaud Zinflou, Alexander J. Belyi, Ricardo J. Bessa, Bishnu Prasad Bhattarai , et al. (2 additional authors not shown)

    Abstract: Foundation models (FMs) currently dominate news headlines. They employ advanced deep learning architectures to extract structural information autonomously from vast datasets through self-supervision. The resulting rich representations of complex systems and dynamics can be applied to many downstream applications. Therefore, FMs can find uses in electric power grids, challenged by the energy transi… ▽ More

    Submitted 12 November, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: Major equal contributors: H.F.H., T.B., B.G., L.S.A.M., A.P., A.V., J.W.; Significant equal contributors: J.B., A.B.M., S.C., I.F., B.H., R.J., K.K., V.M., F.M., M.D.M., O.R., H.S., L.X., E.S.Y., A.Z.; Other equal contributors: A.J.B., R.J.B., B.P.B., J.S., S.S; Lead contact: H.F.H

  29. arXiv:2407.01764  [pdf, other

    cs.DC

    Object Proxy Patterns for Accelerating Distributed Applications

    Authors: J. Gregory Pauloski, Valerie Hayot-Sasson, Logan Ward, Alexander Brace, André Bauer, Kyle Chard, Ian Foster

    Abstract: Workflow and serverless frameworks have empowered new approaches to distributed application design by abstracting compute resources. However, their typically limited or one-size-fits-all support for advanced data flow patterns leaves optimization to the application programmer -- optimization that becomes more difficult as data become larger. The transparent object proxy, which provides wide-area r… ▽ More

    Submitted 2 December, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted for publication in Transactions on Parallel and Distributed Systems

  30. arXiv:2406.17710  [pdf, other

    cs.DC

    GreenFaaS: Maximizing Energy Efficiency of HPC Workloads with FaaS

    Authors: Alok Kamatar, Valerie Hayot-Sasson, Yadu Babuji, Andre Bauer, Gourav Rattihalli, Ninad Hogade, Dejan Milojicic, Kyle Chard, Ian Foster

    Abstract: Application energy efficiency can be improved by executing each application component on the compute element that consumes the least energy while also satisfying time constraints. In principle, the function as a service (FaaS) paradigm should simplify such optimizations by abstracting away compute location, but existing FaaS systems do not provide for user transparency over application energy cons… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 11 pages, 10 figures

  31. arXiv:2406.06348  [pdf, other

    cs.LG cs.DC stat.ME

    Causal Discovery over High-Dimensional Structured Hypothesis Spaces with Causal Graph Partitioning

    Authors: Ashka Shah, Adela DePavia, Nathaniel Hudson, Ian Foster, Rick Stevens

    Abstract: The aim in many sciences is to understand the mechanisms that underlie the observed distribution of variables, starting from a set of initial hypotheses. Causal discovery allows us to infer mechanisms as sets of cause and effect relationships in a generalized way -- without necessarily tailoring to a specific domain. Causal discovery algorithms search over a structured hypothesis space, defined by… ▽ More

    Submitted 3 March, 2025; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: TMLR 03/2025

  32. Efficient Data-Parallel Continual Learning with Asynchronous Distributed Rehearsal Buffers

    Authors: Thomas Bouvier, Bogdan Nicolae, Hugo Chaugier, Alexandru Costan, Ian Foster, Gabriel Antoniu

    Abstract: Deep learning has emerged as a powerful method for extracting valuable information from large volumes of data. However, when new training data arrives continuously (i.e., is not fully available from the beginning), incremental training suffers from catastrophic forgetting (i.e., new patterns are reinforced at the expense of previously acquired knowledge). Training from scratch each time new traini… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid), May 2024, Philadelphia (PA), United States

  33. arXiv:2405.15828  [pdf, other

    cs.DL cs.AI

    Oil & Water? Diffusion of AI Within and Across Scientific Fields

    Authors: Eamon Duede, William Dolan, André Bauer, Ian Foster, Karim Lakhani

    Abstract: This study empirically investigates claims of the increasing ubiquity of artificial intelligence (AI) within roughly 80 million research publications across 20 diverse scientific fields, by examining the change in scholarly engagement with AI from 1985 through 2022. We observe exponential growth, with AI-engaged publications increasing approximately thirteenfold (13x) across all fields, suggesting… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  34. arXiv:2405.09939  [pdf, other

    cs.CL cs.AI

    SciQAG: A Framework for Auto-Generated Science Question Answering Dataset with Fine-grained Evaluation

    Authors: Yuwei Wan, Yixuan Liu, Aswathy Ajith, Clara Grazian, Bram Hoex, Wenjie Zhang, Chunyu Kit, Tong Xie, Ian Foster

    Abstract: We introduce SciQAG, a novel framework for automatically generating high-quality science question-answer pairs from a large corpus of scientific literature based on large language models (LLMs). SciQAG consists of a QA generator and a QA evaluator, which work together to extract diverse and research-level questions and answers from scientific papers. Utilizing this framework, we construct a large-… ▽ More

    Submitted 9 July, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  35. arXiv:2404.19717  [pdf, other

    cs.DC

    Automated, Reliable, and Efficient Continental-Scale Replication of 7.3 Petabytes of Climate Simulation Data: A Case Study

    Authors: Lukasz Lacinski, Lee Liming, Steven Turoscy, Cameron Harr, Kyle Chard, Eli Dart, Paul Durack, Sasha Ames, Forrest M. Hoffman, Ian T. Foster

    Abstract: We report on our experiences replicating 7.3 petabytes (PB) of Earth System Grid Federation (ESGF) climate simulation data from Lawrence Livermore National Laboratory (LLNL) in California to Argonne National Laboratory (ANL) in Illinois and Oak Ridge National Laboratory (ORNL) in Tennessee. This movement of some 29 million files, twice, undertaken in order to establish new ESGF nodes at ANL and OR… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  36. MalleTrain: Deep Neural Network Training on Unfillable Supercomputer Nodes

    Authors: Xiaolong Ma, Feng Yan, Lei Yang, Ian Foster, Michael E. Papka, Zhengchun Liu, Rajkumar Kettimuthu

    Abstract: First-come first-serve scheduling can result in substantial (up to 10%) of transiently idle nodes on supercomputers. Recognizing that such unfilled nodes are well-suited for deep neural network (DNN) training, due to the flexible nature of DNN training tasks, Liu et al. proposed that the re-scaling DNN training tasks to fit gaps in schedules be formulated as a mixed-integer linear programming (MIL… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  37. arXiv:2404.04225  [pdf, other

    physics.chem-ph cs.LG

    Twins in rotational spectroscopy: Does a rotational spectrum uniquely identify a molecule?

    Authors: Marcus Schwarting, Nathan A. Seifert, Michael J. Davis, Ben Blaiszik, Ian Foster, Kirill Prozument

    Abstract: Rotational spectroscopy is the most accurate method for determining structures of molecules in the gas phase. It is often assumed that a rotational spectrum is a unique "fingerprint" of a molecule. The availability of large molecular databases and the development of artificial intelligence methods for spectroscopy makes the testing of this assumption timely. In this paper, we pose the determinatio… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  38. arXiv:2404.02163  [pdf, other

    cs.IT

    FastqZip: An Improved Reference-Based Genome Sequence Lossy Compression Framework

    Authors: Yuanjian Liu, Huihao Luo, Zhijun Han, Yao Hu, Yehui Yang, Kyle Chard, Sheng Di, Ian Foster, Jiesheng Wu

    Abstract: Storing and archiving data produced by next-generation sequencing (NGS) is a huge burden for research institutions. Reference-based compression algorithms are effective in dealing with these data. Our work focuses on compressing FASTQ format files with an improved reference-based compression algorithm to achieve a higher compression ratio than other state-of-the-art algorithms. We propose FastqZip… ▽ More

    Submitted 22 February, 2024; originally announced April 2024.

  39. UniFaaS: Programming across Distributed Cyberinfrastructure with Federated Function Serving

    Authors: Yifei Li, Ryan Chard, Yadu Babuji, Kyle Chard, Ian Foster, Zhuozhao Li

    Abstract: Modern scientific applications are increasingly decomposable into individual functions that may be deployed across distributed and diverse cyberinfrastructure such as supercomputers, clouds, and accelerators. Such applications call for new approaches to programming, distributed execution, and function-level management. We present UniFaaS, a parallel programming framework that relies on a federated… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 13 pages, 13 figures, IPDPS2024

  40. arXiv:2403.06077  [pdf, other

    cs.DC

    Steering a Fleet: Adaptation for Large-Scale, Workflow-Based Experiments

    Authors: Jim Pruyne, Valerie Hayot-Sasson, Weijian Zheng, Ryan Chard, Justin M. Wozniak, Tekin Bicer, Kyle Chard, Ian T. Foster

    Abstract: Experimental science is increasingly driven by instruments that produce vast volumes of data and thus a need to manage, compute, describe, and index this data. High performance and distributed computing provide the means of addressing the computing needs; however, in practice, the variety of actions required and the distributed set of resources involved, requires sophisticated "flows" defining the… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  41. arXiv:2402.14129  [pdf, other

    cs.IR cs.CL

    Combining Language and Graph Models for Semi-structured Information Extraction on the Web

    Authors: Zhi Hong, Kyle Chard, Ian Foster

    Abstract: Relation extraction is an efficient way of mining the extraordinary wealth of human knowledge on the Web. Existing methods rely on domain-specific training data or produce noisy outputs. We focus here on extracting targeted relations from semi-structured web pages given only a short description of the relation. We present GraphScholarBERT, an open-domain information extraction method based on a jo… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 7 pages, 2 figures

  42. arXiv:2402.03480  [pdf, other

    cs.LG cs.AI cs.DC

    Trillion Parameter AI Serving Infrastructure for Scientific Discovery: A Survey and Vision

    Authors: Nathaniel Hudson, J. Gregory Pauloski, Matt Baughman, Alok Kamatar, Mansi Sakarvadia, Logan Ward, Ryan Chard, André Bauer, Maksim Levental, Wenyi Wang, Will Engler, Owen Price Skelly, Ben Blaiszik, Rick Stevens, Kyle Chard, Ian Foster

    Abstract: Deep learning methods are transforming research, enabling new techniques, and ultimately leading to new discoveries. As the demand for more capable AI models continues to grow, we are now entering an era of Trillion Parameter Models (TPM), or models with more than a trillion parameters -- such as Huawei's PanGu-$Σ$. We describe a vision for the ecosystem of TPM users and providers that caters to t… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 10 pages, 3 figures, accepted for publication in the proceedings of the 10th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT2023)

  43. arXiv:2401.04552  [pdf, other

    cs.DC

    XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing

    Authors: Torsten Hoefler, Marcin Copik, Pete Beckman, Andrew Jones, Ian Foster, Manish Parashar, Daniel Reed, Matthias Troyer, Thomas Schulthess, Dan Ernst, Jack Dongarra

    Abstract: HPC and Cloud have evolved independently, specializing their innovations into performance or productivity. Acceleration as a Service (XaaS) is a recipe to empower both fields with a shared execution platform that provides transparent access to computing resources, regardless of the underlying cloud or HPC service provider. Bridging HPC and cloud advancements, XaaS presents a unified architecture b… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

  44. arXiv:2401.02524  [pdf, other

    cs.LG cs.AI cs.CV

    Comprehensive Exploration of Synthetic Data Generation: A Survey

    Authors: André Bauer, Simon Trapp, Michael Stenger, Robert Leppich, Samuel Kounev, Mark Leznik, Kyle Chard, Ian Foster

    Abstract: Recent years have witnessed a surge in the popularity of Machine Learning (ML), applied across diverse domains. However, progress is impeded by the scarcity of training data due to expensive acquisition and privacy legislation. Synthetic data emerges as a solution, but the abundance of released models and limited overview literature pose challenges for decision-making. This work surveys 417 Synthe… ▽ More

    Submitted 1 February, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

    Comments: Fixed bug in Figure 44

  45. arXiv:2312.10188  [pdf, other

    cs.LG

    WordScape: a Pipeline to extract multilingual, visually rich Documents with Layout Annotations from Web Crawl Data

    Authors: Maurice Weber, Carlo Siebenschuh, Rory Butler, Anton Alexandrov, Valdemar Thanner, Georgios Tsolakis, Haris Jabbar, Ian Foster, Bo Li, Rick Stevens, Ce Zhang

    Abstract: We introduce WordScape, a novel pipeline for the creation of cross-disciplinary, multilingual corpora comprising millions of pages with annotations for document layout detection. Relating visual and textual items on document pages has gained further significance with the advent of multimodal models. Various approaches proved effective for visual question answering or layout segmentation. However,… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023 Datasets and Benchmarks

  46. arXiv:2312.03989  [pdf, other

    cs.LG cond-mat.mtrl-sci eess.IV physics.data-an

    Rapid detection of rare events from in situ X-ray diffraction data using machine learning

    Authors: Weijian Zheng, Jun-Sang Park, Peter Kenesei, Ahsan Ali, Zhengchun Liu, Ian T. Foster, Nicholas Schwarz, Rajkumar Kettimuthu, Antonino Miceli, Hemant Sharma

    Abstract: High-energy X-ray diffraction methods can non-destructively map the 3D microstructure and associated attributes of metallic polycrystalline engineering materials in their bulk form. These methods are often combined with external stimuli such as thermo-mechanical loading to take snapshots over time of the evolving microstructure and attributes. However, the extreme data volumes and the high costs o… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  47. arXiv:2312.03876  [pdf, other

    physics.ao-ph cs.AI cs.LG

    Scaling transformer neural networks for skillful and reliable medium-range weather forecasting

    Authors: Tung Nguyen, Rohan Shah, Hritik Bansal, Troy Arcomano, Romit Maulik, Veerabhadra Kotamarthi, Ian Foster, Sandeep Madireddy, Aditya Grover

    Abstract: Weather forecasting is a fundamental problem for anticipating and mitigating the impacts of climate change. Recently, data-driven approaches for weather forecasting based on deep learning have shown great promise, achieving accuracies that are competitive with operational systems. However, those methods often employ complex, customized architectures without sufficient ablation analysis, making it… ▽ More

    Submitted 22 October, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Neural Information Processing Systems (NeurIPS 2024)

  48. arXiv:2311.00787  [pdf, other

    cond-mat.mtrl-sci cs.LG

    Accelerating Electronic Stopping Power Predictions by 10 Million Times with a Combination of Time-Dependent Density Functional Theory and Machine Learning

    Authors: Logan Ward, Ben Blaiszik, Cheng-Wei Lee, Troy Martin, Ian Foster, André Schleife

    Abstract: Knowing the rate at which particle radiation releases energy in a material, the stopping power, is key to designing nuclear reactors, medical treatments, semiconductor and quantum materials, and many other technologies. While the nuclear contribution to stopping power, i.e., elastic scattering between atoms, is well understood in the literature, the route for gathering data on the electronic contr… ▽ More

    Submitted 25 June, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

  49. arXiv:2310.16270  [pdf, other

    cs.CL cs.AI cs.LG

    Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism

    Authors: Mansi Sakarvadia, Arham Khan, Aswathy Ajith, Daniel Grzenda, Nathaniel Hudson, André Bauer, Kyle Chard, Ian Foster

    Abstract: Transformer-based Large Language Models (LLMs) are the state-of-the-art for natural language tasks. Recent work has attempted to decode, by reverse engineering the role of linear layers, the internal mechanisms by which LLMs arrive at their final predictions for text completion tasks. Yet little is known about the specific role of attention heads in producing the final token prediction. We propose… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  50. arXiv:2310.04610  [pdf, other

    cs.AI cs.LG

    DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

    Authors: Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri , et al. (67 additional authors not shown)

    Abstract: In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique… ▽ More

    Submitted 11 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.