Showing 1–2 of 2 results for author: Devarajan, H

Search v0.5.6 released 2020-02-24

arXiv:2501.04654 [pdf, other]

cs.DC cs.PF

Recorder: Comprehensive Parallel I/O Tracing and Analysis

Authors: Chen Wang, Izzet Yildirim, Hariharan Devarajan, Kathryn Mohror, Marc Snir

Abstract: This paper presents Recorder, a parallel I/O tracing tool designed to capture comprehensive I/O information on HPC applications. Recorder traces I/O calls across various I/O layers, storing all function parameters for each captured call. The volume of stored information scales linearly the application's execution scale. To address this, we present a sophisticated pattern-recognition-based compress… ▽ More This paper presents Recorder, a parallel I/O tracing tool designed to capture comprehensive I/O information on HPC applications. Recorder traces I/O calls across various I/O layers, storing all function parameters for each captured call. The volume of stored information scales linearly the application's execution scale. To address this, we present a sophisticated pattern-recognition-based compression algorithm. This algorithm identifies and compresses recurring I/O patterns both within individual processes and across multiple processes, significantly reducing space and time overheads. We evaluate the proposed compression algorithm using I/O benchmarks and real-world applications, demonstrating that Recorder can store more information while requiring approximately 12x less storage space compared to its predecessor. Notably, for applications with typical parallel I/O patterns, Recorder achieves a constant trace size regardless of execution scale. Additionally, a comparison with the profiling tool Darshan shows that Recorder captures detailed I/O information without incurring substantial overhead. The richer data collected by Recorder enables new insights and facilitates more in-depth I/O studies, offering valuable contributions to the I/O research community. △ Less

Submitted 8 January, 2025; originally announced January 2025.

Comments: 29 pages. Under Review. Submitted to the Journal of Supercomputing
arXiv:2312.06131 [pdf, other]

cs.DC

ML-based Modeling to Predict I/O Performance on Different Storage Sub-systems

Authors: Yiheng Xu, Pranav Sivaraman, Hariharan Devarajan, Kathryn Mohror, Abhinav Bhatele

Abstract: Parallel applications can spend a significant amount of time performing I/O on large-scale supercomputers. Fast near-compute storage accelerators called burst buffers can reduce the time a processor spends performing I/O and mitigate I/O bottlenecks. However, determining if a given application could be accelerated using burst buffers is not straightforward even for storage experts. The relationshi… ▽ More Parallel applications can spend a significant amount of time performing I/O on large-scale supercomputers. Fast near-compute storage accelerators called burst buffers can reduce the time a processor spends performing I/O and mitigate I/O bottlenecks. However, determining if a given application could be accelerated using burst buffers is not straightforward even for storage experts. The relationship between an application's I/O characteristics (such as I/O volume, processes involved, etc.) and the best storage sub-system for it can be complicated. As a result, adapting parallel applications to use burst buffers efficiently is a trial-and-error process. In this work, we present a Python-based tool called PrismIO that enables programmatic analysis of I/O traces. Using PrismIO, we identify bottlenecks on burst buffers and parallel file systems and explain why certain I/O patterns perform poorly. Further, we use machine learning to model the relationship between I/O characteristics and burst buffer selections. We run IOR (an I/O benchmark) with various I/O characteristics on different storage systems and collect performance data. We use the data as the input for training the model. Our model can predict if a file of an application should be placed on BBs for unseen IOR scenarios with an accuracy of 94.47% and for four real applications with an accuracy of 95.86%. △ Less

Submitted 11 January, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Search v0.5.6 released 2020-02-24