Search | arXiv e-print repository

Estimating the Number of HTTP/3 Responses in QUIC Using Deep Learning

Authors: Barak Gahtan, Robert J. Shahla, Reuven Cohen, Alex M. Bronstein

Abstract: QUIC, a new and increasingly used transport protocol, enhances TCP by offering improved security, performance, and stream multiplexing. These features, however, also impose challenges for network middle-boxes that need to monitor and analyze web traffic. This paper proposes a novel method to estimate the number of HTTP/3 responses in a given QUIC connection by an observer. This estimation reveals… ▽ More QUIC, a new and increasingly used transport protocol, enhances TCP by offering improved security, performance, and stream multiplexing. These features, however, also impose challenges for network middle-boxes that need to monitor and analyze web traffic. This paper proposes a novel method to estimate the number of HTTP/3 responses in a given QUIC connection by an observer. This estimation reveals server behavior, client-server interactions, and data transmission efficiency, which is crucial for various applications such as designing a load balancing solution and detecting HTTP/3 flood attacks. The proposed scheme transforms QUIC connection traces into image sequences and uses machine learning (ML) models, guided by a tailored loss function, to predict response counts. Evaluations on more than seven million images-derived from 100,000 traces collected across 44,000 websites over four months-achieve up to 97% accuracy in both known and unknown server settings and 92% accuracy on previously unseen complete QUIC traces. △ Less

Submitted 28 April, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

arXiv:2410.03728 [pdf, other]

Exploring QUIC Dynamics: A Large-Scale Dataset for Encrypted Traffic Analysis

Authors: Barak Gahtan, Robert J. Shahla, Alex M. Bronstein, Reuven Cohen

Abstract: The increasing adoption of the QUIC transport protocol has transformed encrypted web traffic, necessitating new methodologies for network analysis. However, existing datasets lack the scope, metadata, and decryption capabilities required for robust benchmarking in encrypted traffic research. We introduce VisQUIC, a large-scale dataset of 100,000 labeled QUIC traces from over 44,000 websites, colle… ▽ More The increasing adoption of the QUIC transport protocol has transformed encrypted web traffic, necessitating new methodologies for network analysis. However, existing datasets lack the scope, metadata, and decryption capabilities required for robust benchmarking in encrypted traffic research. We introduce VisQUIC, a large-scale dataset of 100,000 labeled QUIC traces from over 44,000 websites, collected over four months. Unlike prior datasets, VisQUIC provides SSL keys for controlled decryption, supports multiple QUIC implementations (Chromium QUIC, Facebooks mvfst, Cloudflares quiche), and introduces a novel image-based representation that enables machine learning-driven encrypted traffic analysis. The dataset includes standardized benchmarking tools, ensuring reproducibility. To demonstrate VisQUICs utility, we present a benchmarking task for estimating HTTP/3 responses in encrypted QUIC traffic, achieving 97% accuracy using only observable packet features. By publicly releasing VisQUIC, we provide an open foundation for advancing encrypted traffic analysis, QUIC security research, and network monitoring. △ Less

Submitted 24 May, 2025; v1 submitted 30 September, 2024; originally announced October 2024.

Comments: The dataset and the supplementary material can be provided upon request

arXiv:2401.05219 [pdf, other]

Distributed Monitoring for Data Distribution Shifts in Edge-ML Fraud Detection

Authors: Nader Karayanni, Robert J. Shahla, Chieh-Lien Hsiao

Abstract: The digital era has seen a marked increase in financial fraud. edge ML emerged as a promising solution for smartphone payment services fraud detection, enabling the deployment of ML models directly on edge devices. This approach enables a more personalized real-time fraud detection. However, a significant gap in current research is the lack of a robust system for monitoring data distribution shift… ▽ More The digital era has seen a marked increase in financial fraud. edge ML emerged as a promising solution for smartphone payment services fraud detection, enabling the deployment of ML models directly on edge devices. This approach enables a more personalized real-time fraud detection. However, a significant gap in current research is the lack of a robust system for monitoring data distribution shifts in these distributed edge ML applications. Our work bridges this gap by introducing a novel open-source framework designed for continuous monitoring of data distribution shifts on a network of edge devices. Our system includes an innovative calculation of the Kolmogorov-Smirnov (KS) test over a distributed network of edge devices, enabling efficient and accurate monitoring of users behavior shifts. We comprehensively evaluate the proposed framework employing both real-world and synthetic financial transaction datasets and demonstrate the framework's effectiveness. △ Less

Submitted 10 January, 2024; originally announced January 2024.

arXiv:1708.02787 [pdf, ps, other]

Non-Adaptive Randomized Algorithm for Group Testing

Authors: Nader H. Bshouty, Nuha Diab, Shada R. Kawar, Robert J. Shahla

Abstract: We study the problem of group testing with a non-adaptive randomized algorithm in the random incidence design (RID) model where each entry in the test is chosen randomly independently from $\{0,1\}$ with a fixed probability $p$. The property that is sufficient and necessary for a unique decoding is the separability of the tests, but unfortunately no linear time algorithm is known for such tests.… ▽ More We study the problem of group testing with a non-adaptive randomized algorithm in the random incidence design (RID) model where each entry in the test is chosen randomly independently from $\{0,1\}$ with a fixed probability $p$. The property that is sufficient and necessary for a unique decoding is the separability of the tests, but unfortunately no linear time algorithm is known for such tests. In order to achieve linear-time decodable tests, the algorithms in the literature use the disjunction property that gives almost optimal number of tests. We define a new property for the tests which we call semi-disjunction property. We show that there is a linear time decoding for such test and for $d\to \infty$ the number of tests converges to the number of tests with the separability property and is therefore optimal (in the RID model). Our analysis shows that, in the RID model, the number of tests in our algorithm is better than the one with the disjunction property even for small $d$. △ Less

Submitted 9 August, 2017; originally announced August 2017.

arXiv:1602.05032 [pdf, other]

Enumerating all the Irreducible Polynomials over Finite Field

Authors: Nader H. Bshouty, Nuha Diab, Shada R. Kawar, Robert J. Shahla

Abstract: In this paper we give a detailed analysis of deterministic and randomized algorithms that enumerate any number of irreducible polynomials of degree $n$ over a finite field and their roots in the extension field in quasilinear where $N=n^2$ is the size of the output.} time cost per element. Our algorithm is based on an improved algorithm for enumerating all the Lyndon words of length $n$ in linea… ▽ More In this paper we give a detailed analysis of deterministic and randomized algorithms that enumerate any number of irreducible polynomials of degree $n$ over a finite field and their roots in the extension field in quasilinear where $N=n^2$ is the size of the output.} time cost per element. Our algorithm is based on an improved algorithm for enumerating all the Lyndon words of length $n$ in linear delay time and the known reduction of Lyndon words to irreducible polynomials. △ Less

Submitted 11 August, 2016; v1 submitted 16 February, 2016; originally announced February 2016.

Showing 1–5 of 5 results for author: Shahla, R J