-
Lock Prediction for Zero-Downtime Database Encryption
Authors:
Mohamed Sami Rakha,
Adam Sorrenti,
Greg Stager,
Walid Rjaibi,
Andriy Miranskyy
Abstract:
Modern enterprise database systems face significant challenges in balancing data security and performance. Ensuring robust encryption for sensitive information is critical for systems' compliance with security standards. Although holistic database encryption provides strong protection, existing database systems often require a complete backup and restore cycle, resulting in prolonged downtime and…
▽ More
Modern enterprise database systems face significant challenges in balancing data security and performance. Ensuring robust encryption for sensitive information is critical for systems' compliance with security standards. Although holistic database encryption provides strong protection, existing database systems often require a complete backup and restore cycle, resulting in prolonged downtime and increased storage usage. This makes it difficult to implement online encryption techniques in high-throughput environments without disrupting critical operations.
To address this challenge, we envision a solution that enables online database encryption aligned with system activity, eliminating the need for downtime, storage overhead, or full-database reprocessing. Central to this vision is the ability to predict which parts of the database will be accessed next, allowing encryption to be applied online. As a step towards this solution, this study proposes a predictive approach that leverages deep learning models to forecast database lock sequences, using IBM Db2 as the database system under study. In this study, we collected a specialized dataset from TPC-C benchmark workloads, leveraging lock event logs for model training and evaluation. We applied deep learning architectures, such as Transformer and LSTM, to evaluate models for various table-level and page-level lock predictions. We benchmark the accuracy of the trained models versus a Naive Baseline across different prediction horizons and timelines.
The study experiments demonstrate that the proposed deep learning-based models achieve up to 49% average accuracy for table-level and 66% for page-level predictions, outperforming a Naive Baseline. By anticipating which tables and pages will be locked next, the proposed approach is a step toward online encryption, offering a practical path toward secure, low-overhead database systems.
△ Less
Submitted 30 June, 2025;
originally announced June 2025.
-
Temperature Matters: Enhancing Watermark Robustness Against Paraphrasing Attacks
Authors:
Badr Youbi Idrissi,
Monica Millunzi,
Amelia Sorrenti,
Lorenzo Baraldi,
Daryna Dementieva
Abstract:
In the present-day scenario, Large Language Models (LLMs) are establishing their presence as powerful instruments permeating various sectors of society. While their utility offers valuable support to individuals, there are multiple concerns over potential misuse. Consequently, some academic endeavors have sought to introduce watermarking techniques, characterized by the inclusion of markers within…
▽ More
In the present-day scenario, Large Language Models (LLMs) are establishing their presence as powerful instruments permeating various sectors of society. While their utility offers valuable support to individuals, there are multiple concerns over potential misuse. Consequently, some academic endeavors have sought to introduce watermarking techniques, characterized by the inclusion of markers within machine-generated text, to facilitate algorithmic identification. This research project is focused on the development of a novel methodology for the detection of synthetic text, with the overarching goal of ensuring the ethical application of LLMs in AI-driven text generation. The investigation commences with replicating findings from a previous baseline study, thereby underscoring its susceptibility to variations in the underlying generation model. Subsequently, we propose an innovative watermarking approach and subject it to rigorous evaluation, employing paraphrased generated text to asses its robustness. Experimental results highlight the robustness of our proposal compared to the~\cite{aarson} watermarking method.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
FeDETR: a Federated Approach for Stenosis Detection in Coronary Angiography
Authors:
Raffaele Mineo,
Amelia Sorrenti,
Federica Proietto Salanitri
Abstract:
Assessing the severity of stenoses in coronary angiography is critical to the patient's health, as coronary stenosis is an underlying factor in heart failure. Current practice for grading coronary lesions, i.e. fractional flow reserve (FFR) or instantaneous wave-free ratio (iFR), suffers from several drawbacks, including time, cost and invasiveness, alongside potential interobserver variability. I…
▽ More
Assessing the severity of stenoses in coronary angiography is critical to the patient's health, as coronary stenosis is an underlying factor in heart failure. Current practice for grading coronary lesions, i.e. fractional flow reserve (FFR) or instantaneous wave-free ratio (iFR), suffers from several drawbacks, including time, cost and invasiveness, alongside potential interobserver variability. In this context, some deep learning methods have emerged to assist cardiologists in automating the estimation of FFR/iFR values. Despite the effectiveness of these methods, their reliance on large datasets is challenging due to the distributed nature of sensitive medical data. Federated learning addresses this challenge by aggregating knowledge from multiple nodes to improve model generalization, while preserving data privacy. We propose the first federated detection transformer approach, FeDETR, to assess stenosis severity in angiography videos based on FFR/iFR values estimation. In our approach, each node trains a detection transformer (DETR) on its local dataset, with the central server federating the backbone part of the network. The proposed method is trained and evaluated on a dataset collected from five hospitals, consisting of 1001 angiographic examinations, and its performance is compared with state-of-the-art federated learning methods.
△ Less
Submitted 21 September, 2024;
originally announced September 2024.
-
On Using Quasirandom Sequences in Machine Learning for Model Weight Initialization
Authors:
Andriy Miranskyy,
Adam Sorrenti,
Viral Thakar
Abstract:
The effectiveness of training neural networks directly impacts computational costs, resource allocation, and model development timelines in machine learning applications. An optimizer's ability to train the model adequately (in terms of trained model performance) depends on the model's initial weights. Model weight initialization schemes use pseudorandom number generators (PRNGs) as a source of ra…
▽ More
The effectiveness of training neural networks directly impacts computational costs, resource allocation, and model development timelines in machine learning applications. An optimizer's ability to train the model adequately (in terms of trained model performance) depends on the model's initial weights. Model weight initialization schemes use pseudorandom number generators (PRNGs) as a source of randomness.
We investigate whether substituting PRNGs for low-discrepancy quasirandom number generators (QRNGs) -- namely Sobol' sequences -- as a source of randomness for initializers can improve model performance. We examine Multi-Layer Perceptrons (MLP), Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Transformer architectures trained on MNIST, CIFAR-10, and IMDB datasets using SGD and Adam optimizers. Our analysis uses ten initialization schemes: Glorot, He, Lecun (both Uniform and Normal); Orthogonal, Random Normal, Truncated Normal, and Random Uniform. Models with weights set using PRNG- and QRNG-based initializers are compared pairwise for each combination of dataset, architecture, optimizer, and initialization scheme.
Our findings indicate that QRNG-based neural network initializers either reach a higher accuracy or achieve the same accuracy more quickly than PRNG-based initializers in 60% of the 120 experiments conducted. Thus, using QRNG-based initializers instead of PRNG-based initializers can speed up and improve model training.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Spectral Mapping of Singing Voices: U-Net-Assisted Vocal Segmentation
Authors:
Adam Sorrenti
Abstract:
Separating vocal elements from musical tracks is a longstanding challenge in audio signal processing. This study tackles the distinct separation of vocal components from musical spectrograms. We employ the Short Time Fourier Transform (STFT) to extract audio waves into detailed frequency-time spectrograms, utilizing the benchmark MUSDB18 dataset for music separation. Subsequently, we implement a U…
▽ More
Separating vocal elements from musical tracks is a longstanding challenge in audio signal processing. This study tackles the distinct separation of vocal components from musical spectrograms. We employ the Short Time Fourier Transform (STFT) to extract audio waves into detailed frequency-time spectrograms, utilizing the benchmark MUSDB18 dataset for music separation. Subsequently, we implement a UNet neural network to segment the spectrogram image, aiming to delineate and extract singing voice components accurately. We achieved noteworthy results in audio source separation using of our U-Net-based models. The combination of frequency-axis normalization with Min/Max scaling and the Mean Absolute Error (MAE) loss function achieved the highest Source-to-Distortion Ratio (SDR) of 7.1 dB, indicating a high level of accuracy in preserving the quality of the original signal during separation. This setup also recorded impressive Source-to-Interference Ratio (SIR) and Source-to-Artifact Ratio (SAR) scores of 25.2 dB and 7.2 dB, respectively. These values significantly outperformed other configurations, particularly those using Quantile-based normalization or a Mean Squared Error (MSE) loss function. Our source code, model weights, and demo material can be found at the project's GitHub repository: https://github.com/mbrotos/SoundSeg
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Wake-Sleep Consolidated Learning
Authors:
Amelia Sorrenti,
Giovanni Bellitto,
Federica Proietto Salanitri,
Matteo Pennisi,
Simone Palazzo,
Concetto Spampinato
Abstract:
We propose Wake-Sleep Consolidated Learning (WSCL), a learning strategy leveraging Complementary Learning System theory and the wake-sleep phases of the human brain to improve the performance of deep neural networks for visual classification tasks in continual learning settings. Our method learns continually via the synchronization between distinct wake and sleep phases. During the wake phase, the…
▽ More
We propose Wake-Sleep Consolidated Learning (WSCL), a learning strategy leveraging Complementary Learning System theory and the wake-sleep phases of the human brain to improve the performance of deep neural networks for visual classification tasks in continual learning settings. Our method learns continually via the synchronization between distinct wake and sleep phases. During the wake phase, the model is exposed to sensory input and adapts its representations, ensuring stability through a dynamic parameter freezing mechanism and storing episodic memories in a short-term temporary memory (similarly to what happens in the hippocampus). During the sleep phase, the training process is split into NREM and REM stages. In the NREM stage, the model's synaptic weights are consolidated using replayed samples from the short-term and long-term memory and the synaptic plasticity mechanism is activated, strengthening important connections and weakening unimportant ones. In the REM stage, the model is exposed to previously-unseen realistic visual sensory experience, and the dreaming process is activated, which enables the model to explore the potential feature space, thus preparing synapses to future knowledge. We evaluate the effectiveness of our approach on three benchmark datasets: CIFAR-10, Tiny-ImageNet and FG-ImageNet. In all cases, our method outperforms the baselines and prior work, yielding a significant performance gain on continual visual classification tasks. Furthermore, we demonstrate the usefulness of all processing stages and the importance of dreaming to enable positive forward transfer.
△ Less
Submitted 6 December, 2023;
originally announced January 2024.