Skip to main content

Showing 1–3 of 3 results for author: Gagliardelli, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.12902  [pdf, other

    cs.DB

    DXP: Billing Data Preparation for Big Data Analytics

    Authors: Luca Gagliardelli, Domenico Beneventano, Marco Esposito, Luca Zecchini, Giovanni Simonini, Sonia Bergamaschi, Fabio Miselli, Giuseppe Miano

    Abstract: In this paper, we present the data preparation activities that we performed for the Digital Experience Platform (DXP) project, commissioned and supervised by Doxee S.p.A.. DXP manages the billing data of the users of different companies operating in various sectors (electricity and gas, telephony, pay TV, etc.). This data has to be processed to provide services to the users (e.g., interactive bill… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  2. Evaluation of Dataframe Libraries for Data Preparation on a Single Machine

    Authors: Angelo Mozzillo, Luca Zecchini, Luca Gagliardelli, Adeel Aslam, Sonia Bergamaschi, Giovanni Simonini

    Abstract: Data preparation is a trial-and-error process that typically involves countless iterations over the data to define the best pipeline of operators for a given task. With tabular data, practitioners often perform that burdensome activity on local machines by writing ad hoc scripts with libraries based on the Pandas dataframe API and testing them on samples of the entire dataset-the faster the librar… ▽ More

    Submitted 21 November, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Journal ref: Proceedings 28th International Conference on Extending Database Technology, EDBT 2025, Barcelona, Spain, March 25-28, 2025 (pp. 337-349)

  3. arXiv:2204.08801  [pdf, other

    cs.DB

    Generalized Supervised Meta-blocking (technical report)

    Authors: Luca Gagliardelli, George Papadakis, Giovanni Simonini, Sonia Bergamaschi, Themis Palpanas

    Abstract: Entity Resolution constitutes a core data integration task that relies on Blocking in order to tame its quadratic time complexity. Schema-agnostic blocking achieves very high recall, requires no domain knowledge and applies to data of any structuredness and schema heterogeneity. This comes at the cost of many irrelevant candidate pairs (i.e., comparisons), which can be significantly reduced throug… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.