Deep Lake: a Lakehouse for Deep Learning
Authors:
Sasun Hambardzumyan,
Abhinav Tuli,
Levon Ghukasyan,
Fariz Rahman,
Hrant Topchyan,
David Isayan,
Mark McQuade,
Mikayel Harutyunyan,
Tatevik Hakobyan,
Ivo Stranic,
Davit Buniatyan
Abstract:
Traditional data lakes provide critical data infrastructure for analytical workloads by enabling time travel, running SQL queries, ingesting data with ACID transactions, and visualizing petabyte-scale datasets on cloud storage. They allow organizations to break down data silos, unlock data-driven decision-making, improve operational efficiency, and reduce costs. However, as deep learning usage inc…
▽ More
Traditional data lakes provide critical data infrastructure for analytical workloads by enabling time travel, running SQL queries, ingesting data with ACID transactions, and visualizing petabyte-scale datasets on cloud storage. They allow organizations to break down data silos, unlock data-driven decision-making, improve operational efficiency, and reduce costs. However, as deep learning usage increases, traditional data lakes are not well-designed for applications such as natural language processing (NLP), audio processing, computer vision, and applications involving non-tabular datasets. This paper presents Deep Lake, an open-source lakehouse for deep learning applications developed at Activeloop. Deep Lake maintains the benefits of a vanilla data lake with one key difference: it stores complex data, such as images, videos, annotations, as well as tabular data, in the form of tensors and rapidly streams the data over the network to (a) Tensor Query Language, (b) in-browser visualization engine, or (c) deep learning frameworks without sacrificing GPU utilization. Datasets stored in Deep Lake can be accessed from PyTorch, TensorFlow, JAX, and integrate with numerous MLOps tools.
△ Less
Submitted 13 December, 2022; v1 submitted 22 September, 2022;
originally announced September 2022.
The hardness of the independence and matching clutter of a graph
Authors:
Sasun Hambardzumyan,
Vahan V. Mkrtchyan,
Vahe L. Musoyan,
Hovhannes Sargsyan
Abstract:
A {\it clutter} (or {\it antichain} or {\it Sperner family}) $L$ is a pair $(V,E)$, where $V$ is a finite set and $E$ is a family of subsets of $V$ none of which is a subset of another. Usually, the elements of $V$ are called {\it vertices} of $L$, and the elements of $E$ are called {\it edges} of $L$. A subset $s_e$ of an edge $e$ of a clutter is called {\it recognizing} for $e$, if $s_e$ is not…
▽ More
A {\it clutter} (or {\it antichain} or {\it Sperner family}) $L$ is a pair $(V,E)$, where $V$ is a finite set and $E$ is a family of subsets of $V$ none of which is a subset of another. Usually, the elements of $V$ are called {\it vertices} of $L$, and the elements of $E$ are called {\it edges} of $L$. A subset $s_e$ of an edge $e$ of a clutter is called {\it recognizing} for $e$, if $s_e$ is not a subset of another edge. The {\it hardness} of an edge $e$ of a clutter is the ratio of the size of $e\textrm{'s}$ smallest recognizing subset to the size of $e$. The hardness of a clutter is the maximum hardness of its edges. We study the hardness of clutters arising from independent sets and matchings of graphs.
△ Less
Submitted 9 December, 2015; v1 submitted 27 March, 2009;
originally announced March 2009.