-
Improving Oil Slick Trajectory Simulations with Bayesian Optimization
Authors:
Gabriele Accarino,
Marco M. De Carlo,
Igor Atake,
Donatello Elia,
Anusha L. Dissanayake,
Antonio Augusto Sepp Neves,
Juan Peña Ibañez,
Italo Epicoco,
Paola Nassisi,
Sandro Fiore,
Giovanni Coppini
Abstract:
Accurate simulations of oil spill trajectories are essential for supporting practitioners' response and mitigating environmental and socioeconomic impacts. Numerical models, such as MEDSLIK-II, simulate advection, dispersion, and transformation processes of oil particles. However, simulations heavily rely on accurate parameter tuning, still based on expert knowledge and manual calibration. To over…
▽ More
Accurate simulations of oil spill trajectories are essential for supporting practitioners' response and mitigating environmental and socioeconomic impacts. Numerical models, such as MEDSLIK-II, simulate advection, dispersion, and transformation processes of oil particles. However, simulations heavily rely on accurate parameter tuning, still based on expert knowledge and manual calibration. To overcome these limitations, we integrate the MEDSLIK-II numerical oil spill model with a Bayesian optimization framework to iteratively estimate the best physical parameter configuration that yields simulation closer to satellite observations of the slick. We focus on key parameters, such as horizontal diffusivity and drift factor, maximizing the Fraction Skill Score (FSS) as a measure of spatio-temporal overlap between simulated and observed oil distributions. We validate the framework for the Baniyas oil incident that occurred in Syria between August 23 and September 4, 2021, which released over 12,000 $m^3$ of oil. We show that, on average, the proposed approach systematically improves the FSS from 5.82% to 11.07% compared to control simulations initialized with default parameters. The optimization results in consistent improvement across multiple time steps, particularly during periods of increased drift variability, demonstrating the robustness of our method in dynamic environmental conditions.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
Space-Efficient Private Estimation of Quantiles
Authors:
Massimo Cafaro,
Aneglo Coluccia,
Italo Epicoco,
Marco Pulimeno
Abstract:
Fast and accurate estimation of quantiles on data streams coming from communication networks, Internet of Things (IoT), and alike, is at the heart of important data processing applications including statistical analysis, latency monitoring, query optimization for parallel database management systems, and more. Indeed, quantiles are more robust indicators for the underlying distribution, compared t…
▽ More
Fast and accurate estimation of quantiles on data streams coming from communication networks, Internet of Things (IoT), and alike, is at the heart of important data processing applications including statistical analysis, latency monitoring, query optimization for parallel database management systems, and more. Indeed, quantiles are more robust indicators for the underlying distribution, compared to moment-based indicators such as mean and variance. The streaming setting additionally constrains accurate tracking of quantiles, as stream items may arrive at a very high rate and must be processed as quickly as possible and discarded, being their storage usually unfeasible. Since an exact solution is only possible when data are fully stored, the goal in practical contexts is to provide an approximate solution with a provably guaranteed bound on the approximation error committed, while using a minimal amount of space. At the same time, with the increasing amount of personal and sensitive information exchanged, it is essential to design privacy protection techniques to ensure confidentiality and data integrity. In this paper we present the following differentially private streaming algorithms for frugal estimation of a quantile: \textsc{DP-Frugal-1U-L}, \textsc{DP-Frugal-1U-G}, \textsc{DP-Frugal-1U-$ρ$}. Frugality refers to the ability of the algorithms to provide a good approximation to the sought quantile using a modest amount of space, either one or two units of memory. We provide a theoretical analysis and experimental results.
△ Less
Submitted 12 June, 2025; v1 submitted 27 February, 2025;
originally announced February 2025.
-
Just In Time Transformers
Authors:
Ahmed Ala Eddine Benali,
Massimo Cafaro,
Italo Epicoco,
Marco Pulimeno,
Enrico Junior Schioppa
Abstract:
Precise energy load forecasting in residential households is crucial for mitigating carbon emissions and enhancing energy efficiency; indeed, accurate forecasting enables utility companies and policymakers, who advocate sustainable energy practices, to optimize resource utilization. Moreover, smart meters provide valuable information by allowing for granular insights into consumption patterns. Bui…
▽ More
Precise energy load forecasting in residential households is crucial for mitigating carbon emissions and enhancing energy efficiency; indeed, accurate forecasting enables utility companies and policymakers, who advocate sustainable energy practices, to optimize resource utilization. Moreover, smart meters provide valuable information by allowing for granular insights into consumption patterns. Building upon available smart meter data, our study aims to cluster consumers into distinct groups according to their energy usage behaviours, effectively capturing a diverse spectrum of consumption patterns. Next, we design JITtrans (Just In Time transformer), a novel transformer deep learning model that significantly improves energy consumption forecasting accuracy, with respect to traditional forecasting methods. Extensive experimental results validate our claims using proprietary smart meter data. Our findings highlight the potential of advanced predictive technologies to revolutionize energy management and advance sustainable power systems: the development of efficient and eco-friendly energy solutions critically depends on such technologies.
△ Less
Submitted 22 November, 2024; v1 submitted 22 October, 2024;
originally announced October 2024.
-
Data stream fusion for accurate quantile tracking and analysis
Authors:
Massimo Cafaro,
Catiuscia Melle,
Italo Epicoco,
Marco Pulimeno
Abstract:
UDDSKETCH is a recent algorithm for accurate tracking of quantiles in data streams, derived from the DDSKETCH algorithm. UDDSKETCH provides accuracy guarantees covering the full range of quantiles independently of the input distribution and greatly improves the accuracy with regard to DDSKETCH. In this paper we show how to compress and fuse data streams (or datasets) by using UDDSKETCH data summar…
▽ More
UDDSKETCH is a recent algorithm for accurate tracking of quantiles in data streams, derived from the DDSKETCH algorithm. UDDSKETCH provides accuracy guarantees covering the full range of quantiles independently of the input distribution and greatly improves the accuracy with regard to DDSKETCH. In this paper we show how to compress and fuse data streams (or datasets) by using UDDSKETCH data summaries that are fused into a new summary related to the union of the streams (or datasets) processed by the input summaries whilst preserving both the error and size guarantees provided by UDDSKETCH. This property of sketches, known as mergeability, enables parallel and distributed processing. We prove that UDDSKETCH is fully mergeable and introduce a parallel version of UDDSKETCH suitable for message-passing based architectures. We formally prove its correctness and compare it to a parallel version of DDSKETCH, showing through extensive experimental results that our parallel algorithm almost always outperforms the parallel DDSKETCH algorithm with regard to the overall accuracy in determining the quantiles.
△ Less
Submitted 17 January, 2021;
originally announced January 2021.
-
UDDSketch: Accurate Tracking of Quantiles in Data Streams
Authors:
Italo Epicoco,
Catiuscia Melle,
Massimo Cafaro,
Marco Pulimeno,
Giuseppe Morleo
Abstract:
We present UDDSketch (Uniform DDSketch), a novel sketch for fast and accurate tracking of quantiles in data streams. This sketch is heavily inspired by the recently introduced DDSketch, and is based on a novel bucket collapsing procedure that allows overcoming the intrinsic limits of the corresponding DDSketch procedures. Indeed, the DDSketch bucket collapsing procedure does not allow the derivati…
▽ More
We present UDDSketch (Uniform DDSketch), a novel sketch for fast and accurate tracking of quantiles in data streams. This sketch is heavily inspired by the recently introduced DDSketch, and is based on a novel bucket collapsing procedure that allows overcoming the intrinsic limits of the corresponding DDSketch procedures. Indeed, the DDSketch bucket collapsing procedure does not allow the derivation of formal guarantees on the accuracy of quantile estimation for data which does not follow a sub-exponential distribution. On the contrary, UDDSketch is designed so that accuracy guarantees can be given over the full range of quantiles and for arbitrary distribution in input. Moreover, our algorithm fully exploits the budgeted memory adaptively in order to guarantee the best possible accuracy over the full range of quantiles. Extensive experimental results on synthetic datasets confirm the validity of our approach.
△ Less
Submitted 18 April, 2020;
originally announced April 2020.
-
Fast Detection of Outliers in Data Streams with the $Q_n$ Estimator
Authors:
Massimo Cafaro,
Catiuscia Melle,
Marco Pulimeno,
Italo Epicoco
Abstract:
We present FQN (Fast $Q_n$), a novel algorithm for fast detection of outliers in data streams. The algorithm works in the sliding window model, checking if an item is an outlier by cleverly computing the $Q_n$ scale estimator in the current window. We thoroughly compare our algorithm for online $Q_n$ with the state of the art competing algorithm by Nunkesser et al, and show that FQN (i) is faster,…
▽ More
We present FQN (Fast $Q_n$), a novel algorithm for fast detection of outliers in data streams. The algorithm works in the sliding window model, checking if an item is an outlier by cleverly computing the $Q_n$ scale estimator in the current window. We thoroughly compare our algorithm for online $Q_n$ with the state of the art competing algorithm by Nunkesser et al, and show that FQN (i) is faster, (ii) its computational complexity does not depend on the input distribution and (iii) it requires less space. Extensive experimental results on synthetic datasets confirm the validity of our approach.
△ Less
Submitted 9 January, 2020; v1 submitted 6 October, 2019;
originally announced October 2019.
-
Distributed mining of time--faded heavy hitters
Authors:
Marco Pulimeno,
Italo Epicoco,
Massimo Cafaro
Abstract:
We present \textsc{P2PTFHH} (Peer--to--Peer Time--Faded Heavy Hitters) which, to the best of our knowledge, is the first distributed algorithm for mining time--faded heavy hitters on unstructured P2P networks. \textsc{P2PTFHH} is based on the \textsc{FDCMSS} (Forward Decay Count--Min Space-Saving) sequential algorithm, and efficiently exploits an averaging gossip protocol, by merging in each inter…
▽ More
We present \textsc{P2PTFHH} (Peer--to--Peer Time--Faded Heavy Hitters) which, to the best of our knowledge, is the first distributed algorithm for mining time--faded heavy hitters on unstructured P2P networks. \textsc{P2PTFHH} is based on the \textsc{FDCMSS} (Forward Decay Count--Min Space-Saving) sequential algorithm, and efficiently exploits an averaging gossip protocol, by merging in each interaction the involved peers' underlying data structures. We formally prove the convergence and correctness properties of our distributed algorithm and show that it is fast and simple to implement. Extensive experimental results confirm that \textsc{P2PTFHH} retains the extreme accuracy and error bound provided by \textsc{FDCMSS} whilst showing excellent scalability. Our contributions are three-fold: (i) we prove that the averaging gossip protocol can be used jointly with our augmented sketch data structure for mining time--faded heavy hitters; (ii) we prove the error bounds on frequency estimation; (iii) we experimentally prove that \textsc{P2PTFHH} is extremely accurate and fast, allowing near real time processing of large datasets.
△ Less
Submitted 1 December, 2018;
originally announced December 2018.
-
Mining frequent items in unstructured P2P networks
Authors:
Massimo Cafaro,
Italo Epicoco,
Marco Pulimeno
Abstract:
Large scale decentralized systems, such as P2P, sensor or IoT device networks are becoming increasingly common, and require robust protocols to address the challenges posed by the distribution of data and the large number of peers belonging to the network. In this paper, we deal with the problem of mining frequent items in unstructured P2P networks. This problem, of practical importance, has many…
▽ More
Large scale decentralized systems, such as P2P, sensor or IoT device networks are becoming increasingly common, and require robust protocols to address the challenges posed by the distribution of data and the large number of peers belonging to the network. In this paper, we deal with the problem of mining frequent items in unstructured P2P networks. This problem, of practical importance, has many useful applications. We design P2PSS, a fully decentralized, gossip--based protocol for frequent items discovery, leveraging the Space-Saving algorithm. We formally prove the correctness and theoretical error bound. Extensive experimental results clearly show that P2PSS provides very good accuracy and scalability, also in the presence of highly dynamic P2P networks with churning. To the best of our knowledge, this is the first gossip--based distributed algorithm providing strong theoretical guarantees for both the Approximate Frequent Items Problem in Unstructured P2P Networks and for the frequency estimation of discovered frequent items.
△ Less
Submitted 16 October, 2018; v1 submitted 18 June, 2018;
originally announced June 2018.
-
Parallel mining of time-faded heavy hitters
Authors:
Massimo Cafaro,
Marco Pulimeno,
Italo Epicoco
Abstract:
We present PFDCMSS, a novel message-passing based parallel algorithm for mining time-faded heavy hitters. The algorithm is a parallel version of the recently published FDCMSS sequential algorithm. We formally prove its correctness by showing that the underlying data structure, a sketch augmented with a Space Saving stream summary holding exactly two counters, is mergeable. Whilst mergeability of t…
▽ More
We present PFDCMSS, a novel message-passing based parallel algorithm for mining time-faded heavy hitters. The algorithm is a parallel version of the recently published FDCMSS sequential algorithm. We formally prove its correctness by showing that the underlying data structure, a sketch augmented with a Space Saving stream summary holding exactly two counters, is mergeable. Whilst mergeability of traditional sketches derives immediately from theory, we show that merging our augmented sketch is non trivial. Nonetheless, the resulting parallel algorithm is fast and simple to implement. To the best of our knowledge, PFDCMSS is the first parallel algorithm solving the problem of mining time-faded heavy hitters on message-passing parallel architectures. Extensive experimental results confirm that PFDCMSS retains the extreme accuracy and error bound provided by FDCMSS whilst providing excellent parallel scalability.
△ Less
Submitted 11 January, 2017;
originally announced January 2017.
-
Fast and Accurate Mining of Correlated Heavy Hitters
Authors:
Italo Epicoco,
Massimo Cafaro,
Marco Pulimeno
Abstract:
The problem of mining Correlated Heavy Hitters (CHH) from a two-dimensional data stream has been introduced recently, and a deterministic algorithm based on the use of the Misra--Gries algorithm has been proposed by Lahiri et al. to solve it. In this paper we present a new counter-based algorithm for tracking CHHs, formally prove its error bounds and correctness and show, through extensive experim…
▽ More
The problem of mining Correlated Heavy Hitters (CHH) from a two-dimensional data stream has been introduced recently, and a deterministic algorithm based on the use of the Misra--Gries algorithm has been proposed by Lahiri et al. to solve it. In this paper we present a new counter-based algorithm for tracking CHHs, formally prove its error bounds and correctness and show, through extensive experimental results, that our algorithm outperforms the Misra--Gries based algorithm with regard to accuracy and speed whilst requiring asymptotically much less space.
△ Less
Submitted 6 April, 2017; v1 submitted 15 November, 2016;
originally announced November 2016.
-
Parallel Space Saving on Multi and Many-Core Processors
Authors:
Massimo Cafaro,
Marco Pulimeno,
Italo Epicoco,
Giovanni Aloisio
Abstract:
Given an array $\mathcal{A}$ of $n$ elements and a value $2 \leq k \leq n$, a frequent item or $k$-majority element is an element occurring in $\mathcal{A}$ more than $n/k$ times. The $k$-majority problem requires finding all of the $k$-majority elements. In this paper we deal with parallel shared-memory algorithms for frequent items; we present a shared-memory version of the Space Saving algorith…
▽ More
Given an array $\mathcal{A}$ of $n$ elements and a value $2 \leq k \leq n$, a frequent item or $k$-majority element is an element occurring in $\mathcal{A}$ more than $n/k$ times. The $k$-majority problem requires finding all of the $k$-majority elements. In this paper we deal with parallel shared-memory algorithms for frequent items; we present a shared-memory version of the Space Saving algorithm and we study its behavior with regard to accuracy and performance on many and multi-core processors, including the Intel Phi accelerator. We also investigate a hybrid MPI/OpenMP version against a pure MPI based version. Through extensive experimental results we prove that the MPI/OpenMP parallel version of the algorithm significantly enhances the performance of the earlier pure MPI version of the same algorithm. Results also prove that for this algorithm the Intel Phi accelerator does not introduce any improvement with respect to the Xeon octa-core processor.
△ Less
Submitted 11 January, 2017; v1 submitted 15 June, 2016;
originally announced June 2016.
-
Mining frequent items in the time fading model
Authors:
Massimo Cafaro,
Marco Pulimeno,
Italo Epicoco,
Giovanni Aloisio
Abstract:
We present FDCMSS, a new sketch-based algorithm for mining frequent items in data streams. The algorithm cleverly combines key ideas borrowed from forward decay, the Count-Min and the Space Saving algorithms. It works in the time fading model, mining data streams according to the cash register model. We formally prove its correctness and show, through extensive experimental results, that our algor…
▽ More
We present FDCMSS, a new sketch-based algorithm for mining frequent items in data streams. The algorithm cleverly combines key ideas borrowed from forward decay, the Count-Min and the Space Saving algorithms. It works in the time fading model, mining data streams according to the cash register model. We formally prove its correctness and show, through extensive experimental results, that our algorithm outperforms $λ$-HCount, a recently developed algorithm, with regard to speed, space used, precision attained and error committed on both synthetic and real datasets.
△ Less
Submitted 2 August, 2016; v1 submitted 15 January, 2016;
originally announced January 2016.