-
Keigo: Co-designing Log-Structured Merge Key-Value Stores with a Non-Volatile, Concurrency-aware Storage Hierarchy (Extended Version)
Authors:
Rúben Adão,
Zhongjie Wu,
Changjun Zhou,
Oana Balmau,
João Paulo,
Ricardo Macedo
Abstract:
We present Keigo, a concurrency- and workload-aware storage middleware that enhances the performance of log-structured merge key-value stores (LSM KVS) when they are deployed on a hierarchy of storage devices. The key observation behind Keigo is that there is no one-size-fits-all placement of data across the storage hierarchy that optimizes for all workloads. Hence, to leverage the benefits of com…
▽ More
We present Keigo, a concurrency- and workload-aware storage middleware that enhances the performance of log-structured merge key-value stores (LSM KVS) when they are deployed on a hierarchy of storage devices. The key observation behind Keigo is that there is no one-size-fits-all placement of data across the storage hierarchy that optimizes for all workloads. Hence, to leverage the benefits of combining different storage devices, Keigo places files across different devices based on their parallelism, I/O bandwidth, and capacity. We introduce three techniques - concurrency-aware data placement, persistent read-only caching, and context-based I/O differentiation. Keigo is portable across different LSMs, is adaptable to dynamic workloads, and does not require extensive profiling. Our system enables established production KVS such as RocksDB, LevelDB, and Speedb to benefit from heterogeneous storage setups. We evaluate Keigo using synthetic and realistic workloads, showing that it improves the throughput of production-grade LSMs up to 4x for write- and 18x for read-heavy workloads when compared to general-purpose storage systems and specialized LSM KVS.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Diagnosing applications' I/O behavior through system call observability
Authors:
Tânia Esteves,
Ricardo Macedo,
Rui Oliveira,
João Paulo
Abstract:
We present DIO, a generic tool for observing inefficient and erroneous I/O interactions between applications and in-kernel storage systems that lead to performance, dependability, and correctness issues. DIO facilitates the analysis and enables near real-time visualization of complex I/O patterns for data-intensive applications generating millions of storage requests. This is achieved by non-intru…
▽ More
We present DIO, a generic tool for observing inefficient and erroneous I/O interactions between applications and in-kernel storage systems that lead to performance, dependability, and correctness issues. DIO facilitates the analysis and enables near real-time visualization of complex I/O patterns for data-intensive applications generating millions of storage requests. This is achieved by non-intrusively intercepting system calls, enriching collected data with relevant context, and providing timely analysis and visualization for traced events. We demonstrate its usefulness by analyzing two production-level applications. Results show that DIO enables diagnosing resource contention in multi-threaded I/O that leads to high tail latency and erroneous file accesses that cause data loss.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
PADLL: Taming Metadata-intensive HPC Jobs Through Dynamic, Application-agnostic QoS Control
Authors:
Ricardo Macedo,
Mariana Miranda,
Yusuke Tanimura,
Jason Haga,
Amit Ruhela,
Stephen Lien Harrell,
Richard Todd Evans,
José Pereira,
João Paulo
Abstract:
Modern I/O applications that run on HPC infrastructures are increasingly becoming read and metadata intensive. However, having multiple concurrent applications submitting large amounts of metadata operations can easily saturate the shared parallel file system's metadata resources, leading to overall performance degradation and I/O unfairness. We present PADLL, an application and file system agnost…
▽ More
Modern I/O applications that run on HPC infrastructures are increasingly becoming read and metadata intensive. However, having multiple concurrent applications submitting large amounts of metadata operations can easily saturate the shared parallel file system's metadata resources, leading to overall performance degradation and I/O unfairness. We present PADLL, an application and file system agnostic storage middleware that enables QoS control of data and metadata workflows in HPC storage systems. It adopts ideas from Software-Defined Storage, building data plane stages that mediate and rate limit POSIX requests submitted to the shared file system, and a control plane that holistically coordinates how all I/O workflows are handled. We demonstrate its performance and feasibility under multiple QoS policies using synthetic benchmarks, real-world applications, and traces collected from a production file system. Results show that PADLL can enforce complex storage QoS policies over concurrent metadata-aggressive jobs, ensuring fairness and prioritization.
△ Less
Submitted 23 March, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
PAIO: A Software-Defined Storage Data Plane Framework
Authors:
Ricardo Macedo,
Yusuke Tanimura,
Jason Haga,
Vijay Chidambaram,
José Pereira,
João Paulo
Abstract:
We propose PAIO, the first general-purpose framework that enables system designers to build custom-made Software-Defined Storage (SDS) data plane stages. It provides the means to implement storage optimizations adaptable to different workflows and user-defined policies, and allows straightforward integration with existing applications and I/O layers. PAIO allows stages to be integrated with modern…
▽ More
We propose PAIO, the first general-purpose framework that enables system designers to build custom-made Software-Defined Storage (SDS) data plane stages. It provides the means to implement storage optimizations adaptable to different workflows and user-defined policies, and allows straightforward integration with existing applications and I/O layers. PAIO allows stages to be integrated with modern SDS control planes to ensure holistic control and system-wide optimal performance. We demonstrate the performance and applicability of PAIO with two use cases. The first improves 99th percentile latency by 4x in industry-standard LSM-based key-value stores. The second ensures dynamic per-application bandwidth guarantees under shared storage environments.
△ Less
Submitted 12 August, 2021; v1 submitted 7 June, 2021;
originally announced June 2021.
-
CNN-based Approaches For Cross-Subject Classification in Motor Imagery: From The State-of-The-Art to DynamicNet
Authors:
Alberto Zancanaro,
Giulia Cisotto,
João Ruivo Paulo,
Gabriel Pires,
Urbano J. Nunes
Abstract:
Motor imagery (MI)-based brain-computer interface (BCI) systems are being increasingly employed to provide alternative means of communication and control for people suffering from neuro-motor impairments, with a special effort to bring these systems out of the controlled lab environments. Hence, accurately classifying MI from brain signals, e.g., from electroencephalography (EEG), is essential to…
▽ More
Motor imagery (MI)-based brain-computer interface (BCI) systems are being increasingly employed to provide alternative means of communication and control for people suffering from neuro-motor impairments, with a special effort to bring these systems out of the controlled lab environments. Hence, accurately classifying MI from brain signals, e.g., from electroencephalography (EEG), is essential to obtain reliable BCI systems. However, MI classification is still a challenging task, because the signals are characterized by poor SNR, high intra-subject and cross-subject variability. Deep learning approaches have started to emerge as valid alternatives to standard machine learning techniques, e.g., filter bank common spatial pattern (FBCSP), to extract subject-independent features and to increase the cross-subject classification performance of MI BCI systems. In this paper, we first present a review of the most recent studies using deep learning for MI classification, with particular attention to their cross-subject performance. Second, we propose DynamicNet, a Python-based tool for quick and flexible implementations of deep learning models based on convolutional neural networks. We show-case the potentiality of DynamicNet by implementing EEGNet, a well-established architecture for effective EEG classification. Finally, we compare its performance with FBCSP in a 4-class MI classification over public datasets. To explore its cross-subject classification ability, we applied three different cross-validation schemes. From our results, we demonstrate that DynamicNet-implemented EEGNet outperforms FBCSP by about 25%, with a statistically significant difference when cross-subject validation schemes are applied.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Asteroids' physical models from combined dense and sparse photometry and scaling of the YORP effect by the observed obliquity distribution
Authors:
J. Hanuš,
J. Ďurech,
M. Brož,
A. Marciniak,
B. D. Warner,
F. Pilcher,
R. Stephens,
R. Behrend,
B. Carry,
D. Čapek,
P. Antonini,
M. Audejean,
K. Augustesen,
E. Barbotin,
P. Baudouin,
A. Bayol,
L. Bernasconi,
W. Borczyk,
J. -G. Bosch,
E. Brochard,
L. Brunetto,
S. Casulli,
A. Cazenave,
S. Charbonnel,
B. Christophe
, et al. (95 additional authors not shown)
Abstract:
The larger number of models of asteroid shapes and their rotational states derived by the lightcurve inversion give us better insight into both the nature of individual objects and the whole asteroid population. With a larger statistical sample we can study the physical properties of asteroid populations, such as main-belt asteroids or individual asteroid families, in more detail. Shape models can…
▽ More
The larger number of models of asteroid shapes and their rotational states derived by the lightcurve inversion give us better insight into both the nature of individual objects and the whole asteroid population. With a larger statistical sample we can study the physical properties of asteroid populations, such as main-belt asteroids or individual asteroid families, in more detail. Shape models can also be used in combination with other types of observational data (IR, adaptive optics images, stellar occultations), e.g., to determine sizes and thermal properties. We use all available photometric data of asteroids to derive their physical models by the lightcurve inversion method and compare the observed pole latitude distributions of all asteroids with known convex shape models with the simulated pole latitude distributions. We used classical dense photometric lightcurves from several sources and sparse-in-time photometry from the U.S. Naval Observatory in Flagstaff, Catalina Sky Survey, and La Palma surveys (IAU codes 689, 703, 950) in the lightcurve inversion method to determine asteroid convex models and their rotational states. We also extended a simple dynamical model for the spin evolution of asteroids used in our previous paper. We present 119 new asteroid models derived from combined dense and sparse-in-time photometry. We discuss the reliability of asteroid shape models derived only from Catalina Sky Survey data (IAU code 703) and present 20 such models. By using different values for a scaling parameter cYORP (corresponds to the magnitude of the YORP momentum) in the dynamical model for the spin evolution and by comparing synthetics and observed pole-latitude distributions, we were able to constrain the typical values of the cYORP parameter as between 0.05 and 0.6.
△ Less
Submitted 29 January, 2013;
originally announced January 2013.