-
SkyStore: Cost-Optimized Object Storage Across Regions and Clouds
Authors:
Shu Liu,
Xiangxi Mo,
Moshik Hershcovitch,
Henric Zhang,
Audrey Cheng,
Guy Girmonsky,
Gil Vernik,
Michael Factor,
Tiemo Bang,
Soujanya Ponnapalli,
Natacha Crooks,
Joseph E. Gonzalez,
Danny Harnik,
Ion Stoica
Abstract:
Modern applications span multiple clouds to reduce costs, avoid vendor lock-in, and leverage low-availability resources in another cloud. However, standard object stores operate within a single cloud, forcing users to manually manage data placement across clouds, i.e., navigate their diverse APIs and handle heterogeneous costs for network and storage. This is often a complex choice: users must eit…
▽ More
Modern applications span multiple clouds to reduce costs, avoid vendor lock-in, and leverage low-availability resources in another cloud. However, standard object stores operate within a single cloud, forcing users to manually manage data placement across clouds, i.e., navigate their diverse APIs and handle heterogeneous costs for network and storage. This is often a complex choice: users must either pay to store objects in a remote cloud, or pay to transfer them over the network based on application access patterns and cloud provider cost offerings. To address this, we present SkyStore, a unified object store that addresses cost-optimal data management across regions and clouds. SkyStore introduces a virtual object and bucket API to hide the complexity of interacting with multiple clouds. At its core, SkyStore has a novel TTL-based data placement policy that dynamically replicates and evicts objects according to application access patterns while optimizing for lower cost. Our evaluation shows that across various workloads, SkyStore reduces the overall cost by up to 6x over academic baselines and commercial alternatives like AWS multi-region buckets. SkyStore also has comparable latency, and its availability and fault tolerance are on par with standard cloud offerings. We release the data and code of SkyStore at https://github.com/skyplane-project/skystore.
△ Less
Submitted 28 February, 2025;
originally announced February 2025.
-
ZipNN: Lossless Compression for AI Models
Authors:
Moshik Hershcovitch,
Andrew Wood,
Leshem Choshen,
Guy Girmonsky,
Roy Leibovitz,
Ilias Ennmouri,
Michal Malka,
Peter Chin,
Swaminathan Sundararaman,
Danny Harnik
Abstract:
With the growth of model sizes and the scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these. While there is a vast model compression literature deleting parts of the model weights for faster inference, we investigate a more traditional type of compression - one that represents the model in a compact form and is coupled…
▽ More
With the growth of model sizes and the scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these. While there is a vast model compression literature deleting parts of the model weights for faster inference, we investigate a more traditional type of compression - one that represents the model in a compact form and is coupled with a decompression algorithm that returns it to its original form and size - namely lossless compression.
We present ZipNN a lossless compression tailored to neural networks. Somewhat surprisingly, we show that specific lossless compression can gain significant network and storage reduction on popular models, often saving 33% and at times reducing over 50% of the model size. We investigate the source of model compressibility and introduce specialized compression variants tailored for models that further increase the effectiveness of compression. On popular models (e.g. Llama 3) ZipNN shows space savings that are over 17% better than vanilla compression while also improving compression and decompression speeds by 62%. We estimate that these methods could save over an ExaByte per month of network traffic downloaded from a large model hub like Hugging Face.
△ Less
Submitted 4 June, 2025; v1 submitted 7 November, 2024;
originally announced November 2024.
-
Lossless and Near-Lossless Compression for Foundation Models
Authors:
Moshik Hershcovitch,
Leshem Choshen,
Andrew Wood,
Ilias Enmouri,
Peter Chin,
Swaminathan Sundararaman,
Danny Harnik
Abstract:
With the growth of model sizes and scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these. While there is a vast literature about reducing model sizes, we investigate a more traditional type of compression -- one that compresses the model to a smaller form and is coupled with a decompression algorithm that returns it to i…
▽ More
With the growth of model sizes and scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these. While there is a vast literature about reducing model sizes, we investigate a more traditional type of compression -- one that compresses the model to a smaller form and is coupled with a decompression algorithm that returns it to its original size -- namely lossless compression. Somewhat surprisingly, we show that such lossless compression can gain significant network and storage reduction on popular models, at times reducing over $50\%$ of the model size. We investigate the source of model compressibility, introduce compression variants tailored for models and categorize models to compressibility groups. We also introduce a tunable lossy compression technique that can further reduce size even on the less compressible models with little to no effect on the model accuracy. We estimate that these methods could save over an ExaByte per month of network traffic downloaded from a large model hub like HuggingFace.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Prefix Siphoning: Exploiting LSM-Tree Range Filters For Information Disclosure (Full Version)
Authors:
Adi Kaufman,
Moshik Hershcovitch,
Adam Morrison
Abstract:
Key-value stores typically leave access control to the systems for which they act as storage engines. Unfortunately, attackers may circumvent such read access controls via timing attacks on the key-value store, which use differences in query response times to glean information about stored data.
To date, key-value store timing attacks have aimed to disclose stored values and have exploited exter…
▽ More
Key-value stores typically leave access control to the systems for which they act as storage engines. Unfortunately, attackers may circumvent such read access controls via timing attacks on the key-value store, which use differences in query response times to glean information about stored data.
To date, key-value store timing attacks have aimed to disclose stored values and have exploited external mechanisms that can be disabled for protection. In this paper, we point out that key disclosure is also a security threat -- and demonstrate key disclosure timing attacks that exploit mechanisms of the key-value store itself.
We target LSM-tree based key-value stores utilizing range filters, which have been recently proposed to optimize LSM-tree range queries. We analyze the impact of the range filters SuRF and prefix Bloom filter on LSM-trees through a security lens, and show that they enable a key disclosure timing attack, which we call prefix siphoning. Prefix siphoning successfully leverages benign queries for non-present keys to identify prefixes of actual keys -- and in some cases, full keys -- in scenarios where brute force searching for keys (via exhaustive enumeration or random guesses) is infeasible.
△ Less
Submitted 8 September, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Fast Feature Selection with Fairness Constraints
Authors:
Francesco Quinzan,
Rajiv Khanna,
Moshik Hershcovitch,
Sarel Cohen,
Daniel G. Waddington,
Tobias Friedrich,
Michael W. Mahoney
Abstract:
We study the fundamental problem of selecting optimal features for model construction. This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants. To address this challenge, we extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-…
▽ More
We study the fundamental problem of selecting optimal features for model construction. This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants. To address this challenge, we extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-submodular functions. The proposed algorithm achieves exponentially fast parallel run time in the adaptive query model, scaling much better than prior work. Furthermore, our extension allows the use of downward-closed constraints, which can be used to encode certain fairness criteria into the feature selection process. We prove strong approximation guarantees for the algorithm based on standard assumptions. These guarantees are applicable to many parametric models, including Generalized Linear Models. Finally, we demonstrate empirically that the proposed algorithm competes favorably with state-of-the-art techniques for feature selection, on real-world and synthetic datasets.
△ Less
Submitted 3 February, 2023; v1 submitted 28 February, 2022;
originally announced February 2022.
-
Non-Volatile Memory Accelerated Geometric Multi-Scale Resolution Analysis
Authors:
Andrew Wood,
Moshik Hershcovitch,
Daniel Waddington,
Sarel Cohen,
Meredith Wolf,
Hongjun Suh,
Weiyu Zong,
Peter Chin
Abstract:
Dimensionality reduction algorithms are standard tools in a researcher's toolbox. Dimensionality reduction algorithms are frequently used to augment downstream tasks such as machine learning, data science, and also are exploratory methods for understanding complex phenomena. For instance, dimensionality reduction is commonly used in Biology as well as Neuroscience to understand data collected from…
▽ More
Dimensionality reduction algorithms are standard tools in a researcher's toolbox. Dimensionality reduction algorithms are frequently used to augment downstream tasks such as machine learning, data science, and also are exploratory methods for understanding complex phenomena. For instance, dimensionality reduction is commonly used in Biology as well as Neuroscience to understand data collected from biological subjects. However, dimensionality reduction techniques are limited by the von-Neumann architectures that they execute on. Specifically, data intensive algorithms such as dimensionality reduction techniques often require fast, high capacity, persistent memory which historically hardware has been unable to provide at the same time. In this paper, we present a re-implementation of an existing dimensionality reduction technique called Geometric Multi-Scale Resolution Analysis (GMRA) which has been accelerated via novel persistent memory technology called Memory Centric Active Storage (MCAS). Our implementation uses a specialized version of MCAS called PyMM that provides native support for Python datatypes including NumPy arrays and PyTorch tensors. We compare our PyMM implementation against a DRAM implementation, and show that when data fits in DRAM, PyMM offers competitive runtimes. When data does not fit in DRAM, our PyMM implementation is still able to process the data.
△ Less
Submitted 21 February, 2022;
originally announced February 2022.
-
Non-Volatile Memory Accelerated Posterior Estimation
Authors:
Andrew Wood,
Moshik Hershcovitch,
Daniel Waddington,
Sarel Cohen,
Peter Chin
Abstract:
Bayesian inference allows machine learning models to express uncertainty. Current machine learning models use only a single learnable parameter combination when making predictions, and as a result are highly overconfident when their predictions are wrong. To use more learnable parameter combinations efficiently, these samples must be drawn from the posterior distribution. Unfortunately computing t…
▽ More
Bayesian inference allows machine learning models to express uncertainty. Current machine learning models use only a single learnable parameter combination when making predictions, and as a result are highly overconfident when their predictions are wrong. To use more learnable parameter combinations efficiently, these samples must be drawn from the posterior distribution. Unfortunately computing the posterior directly is infeasible, so often researchers approximate it with a well known distribution such as a Gaussian. In this paper, we show that through the use of high-capacity persistent storage, models whose posterior distribution was too big to approximate are now feasible, leading to improved predictions in downstream tasks.
△ Less
Submitted 21 February, 2022;
originally announced February 2022.
-
A High-Performance Persistent Memory Key-Value Store with Near-Memory Compute
Authors:
Daniel Waddington,
Clem Dickey,
Luna Xu,
Moshik Hershcovitch,
Sangeetha Seshadri
Abstract:
MCAS (Memory Centric Active Storage) is a persistent memory tier for high-performance durable data storage. It is designed from the ground-up to provide a key-value capability with low-latency guarantees and data durability through memory persistence and replication. To reduce data movement and make further gains in performance, we provide support for user-defined "push-down" operations (known as…
▽ More
MCAS (Memory Centric Active Storage) is a persistent memory tier for high-performance durable data storage. It is designed from the ground-up to provide a key-value capability with low-latency guarantees and data durability through memory persistence and replication. To reduce data movement and make further gains in performance, we provide support for user-defined "push-down" operations (known as Active Data Objects) that can execute directly and safely on the value-memory associated with one or more keys. The ADO mechanism allows complex pointer-based dynamic data structures (e.g., trees) to be stored and operated on in persistent memory. To this end, we examine a real-world use case for MCAS-ADO in the handling of enterprise storage system metadata for Continuous Data Protection (CDP). This requires continuously updating complex metadata that must be kept consistent and durable. In this paper, we i.) present the MCAS-ADO system architecture, ii.) show how the CDP use case is implemented, and finally iii.) give an evaluation of system performance in the context of this use case.
△ Less
Submitted 12 April, 2021;
originally announced April 2021.
-
An Architecture for Memory Centric Active Storage (MCAS)
Authors:
Daniel Waddington,
Clem Dickey,
Moshik Hershcovitch,
Sangeetha Seshadri
Abstract:
The advent of CPU-attached persistent memory technology, such as Intel's Optane Persistent Memory Modules (PMM), has brought with it new opportunities for storage. In 2018, IBM Research Almaden began investigating and developing a new enterprise-grade storage solution directly aimed at this emerging technology. MCAS (Memory Centric Active Storage) defines an evolved network-attached key-value stor…
▽ More
The advent of CPU-attached persistent memory technology, such as Intel's Optane Persistent Memory Modules (PMM), has brought with it new opportunities for storage. In 2018, IBM Research Almaden began investigating and developing a new enterprise-grade storage solution directly aimed at this emerging technology. MCAS (Memory Centric Active Storage) defines an evolved network-attached key-value store that offers both near-data compute and the ability to layer enterprise-grade data management services on shared persistent memory. As a converged memory-storage tier, MCAS moves towards eliminating the traditional separation of compute and storage, and thereby unifying the data space. This paper provides an in-depth review of the MCAS architecture and implementation, as well as general performance results.
△ Less
Submitted 21 May, 2021; v1 submitted 26 February, 2021;
originally announced March 2021.
-
Minimal Indices for Successor Search
Authors:
Sarel Cohen,
Amos Fiat,
Moshik Hershcovitch,
Haim Kaplan
Abstract:
We give a new successor data structure which improves upon the index size of the Pǎtraşcu-Thorup data structures, reducing the index size from $O(n w^{4/5})$ bits to $O(n \log w)$ bits, with optimal probe complexity. Alternatively, our new data structure can be viewed as matching the space complexity of the (probe-suboptimal) $z$-fast trie of Belazzougui et al. Thus, we get the best of both approa…
▽ More
We give a new successor data structure which improves upon the index size of the Pǎtraşcu-Thorup data structures, reducing the index size from $O(n w^{4/5})$ bits to $O(n \log w)$ bits, with optimal probe complexity. Alternatively, our new data structure can be viewed as matching the space complexity of the (probe-suboptimal) $z$-fast trie of Belazzougui et al. Thus, we get the best of both approaches with respect to both probe count and index size. The penalty we pay is an extra $O(\log w)$ inter-register operations. Our data structure can also be used to solve the weak prefix search problem, the index size of $O(n \log w)$ bits is known to be optimal for any such data structure.
The technical contributions include highly efficient single word indices, with out-degree $w/\log w$ (compared to the $w^{1/5}$ out-degree of fusion tree based indices). To construct such high efficiency single word indices we device highly efficient bit selectors which, we believe, are of independent interest.
△ Less
Submitted 17 June, 2013;
originally announced June 2013.