Skip to main content

Showing 1–2 of 2 results for author: Panadero, R

.
  1. arXiv:2412.06637  [pdf, other

    cs.DB

    FREYJA: Efficient Join Discovery in Data Lakes

    Authors: Marc Maynou, Sergi Nadal, Raquel Panadero, Javier Flores, Oscar Romero, Anna Queralt

    Abstract: Data lakes are massive repositories of raw and heterogeneous data, designed to meet the requirements of modern data storage. Nonetheless, this same philosophy increases the complexity of performing discovery tasks to find relevant data for subsequent processing. As a response to these growing challenges, we present FREYJA, a modern data discovery system capable of effectively exploring data lakes,… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

  2. arXiv:2305.19629  [pdf, other

    cs.DB

    Measuring and Predicting the Quality of a Join for Data Discovery

    Authors: Sergi Nadal, Raquel Panadero, Javier Flores, Oscar Romero

    Abstract: We study the problem of discovering joinable datasets at scale. We approach the problem from a learning perspective relying on profiles. These are succinct representations that capture the underlying characteristics of the schemata and data values of datasets, which can be efficiently extracted in a distributed and parallel fashion. Profiles are then compared, to predict the quality of a join oper… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2012.00890