-
Towards a graph-based foundation model for network traffic analysis
Authors:
Louis Van Langendonck,
Ismael Castell-Uroz,
Pere Barlet-Ros
Abstract:
Foundation models have shown great promise in various fields of study. A potential application of such models is in computer network traffic analysis, where these models can grasp the complexities of network traffic dynamics and adapt to any specific task or network environment with minimal fine-tuning. Previous approaches have used tokenized hex-level packet data and the model architecture of lar…
▽ More
Foundation models have shown great promise in various fields of study. A potential application of such models is in computer network traffic analysis, where these models can grasp the complexities of network traffic dynamics and adapt to any specific task or network environment with minimal fine-tuning. Previous approaches have used tokenized hex-level packet data and the model architecture of large language transformer models. We propose a new, efficient graph-based alternative at the flow-level. Our approach represents network traffic as a dynamic spatio-temporal graph, employing a self-supervised link prediction pretraining task to capture the spatial and temporal dynamics in this network graph framework. To evaluate the effectiveness of our approach, we conduct a few-shot learning experiment for three distinct downstream network tasks: intrusion detection, traffic classification, and botnet classification. Models finetuned from our pretrained base achieve an average performance increase of 6.87\% over training from scratch, demonstrating their ability to effectively learn general network traffic dynamics during pretraining. This success suggests the potential for a large-scale version to serve as an operational foundational model.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
PPT-GNN: A Practical Pre-Trained Spatio-Temporal Graph Neural Network for Network Security
Authors:
Louis Van Langendonck,
Ismael Castell-Uroz,
Pere Barlet-Ros
Abstract:
Recent works have demonstrated the potential of Graph Neural Networks (GNN) for network intrusion detection. Despite their advantages, a significant gap persists between real-world scenarios, where detection speed is critical, and existing proposals, which operate on large graphs representing several hours of traffic. This gap results in unrealistic operational conditions and impractical detection…
▽ More
Recent works have demonstrated the potential of Graph Neural Networks (GNN) for network intrusion detection. Despite their advantages, a significant gap persists between real-world scenarios, where detection speed is critical, and existing proposals, which operate on large graphs representing several hours of traffic. This gap results in unrealistic operational conditions and impractical detection delays. Moreover, existing models do not generalize well across different networks, hampering their deployment in production environments. To address these issues, we introduce PPTGNN, a practical spatio-temporal GNN for intrusion detection. PPTGNN enables near real-time predictions, while better capturing the spatio-temporal dynamics of network attacks. PPTGNN employs self-supervised pre-training for improved performance and reduced dependency on labeled data. We evaluate PPTGNN on three public datasets and show that it significantly outperforms state-of-the-art models, such as E-ResGAT and E-GraphSAGE, with an average accuracy improvement of 10.38%. Finally, we show that a pre-trained PPTGNN can easily be fine-tuned to unseen networks with minimal labeled examples. This highlights the potential of PPTGNN as a general, large-scale pre-trained model that can effectively operate in diverse network environments.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
A first look into Utiq: Next-generation cookies at the ISP level
Authors:
Ismael Castell-Uroz,
Pere Barlet-Ros
Abstract:
Online privacy has become increasingly important in recent years. While third-party cookies have been widely used for years, they have also been criticized for their potential impact on user privacy. They can be used by advertisers to track users across multiple sites, allowing them to build detailed profiles of their behavior and interests. However, nowadays, many browsers allow users to block th…
▽ More
Online privacy has become increasingly important in recent years. While third-party cookies have been widely used for years, they have also been criticized for their potential impact on user privacy. They can be used by advertisers to track users across multiple sites, allowing them to build detailed profiles of their behavior and interests. However, nowadays, many browsers allow users to block third-party cookies, which limits their usefulness for advertisers. In this paper, we take a first look at Utiq, a new way of user tracking performed directly by the ISP, to substitute the third-party cookies used until now. We study the main properties of this new identification methodology and their adoption on the 10K most popular websites. Our results show that, although still marginal due to the restrictions imposed by the system, between 0.7% and 1.2% of websites already include Utiq as one of their user identification methods.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
ASTrack: Automatic Detection and Removal of Web Tracking Code with Minimal Functionality Loss
Authors:
Ismael Castell-Uroz,
Kensuke Fukuda,
Pere Barlet-Ros
Abstract:
Recent advances in web technologies make it more difficult than ever to detect and block web tracking systems. In this work, we propose ASTrack, a novel approach to web tracking detection and removal. ASTrack uses an abstraction of the code structure based on Abstract Syntax Trees to selectively identify web tracking functionality shared across multiple web services. This new methodology allows us…
▽ More
Recent advances in web technologies make it more difficult than ever to detect and block web tracking systems. In this work, we propose ASTrack, a novel approach to web tracking detection and removal. ASTrack uses an abstraction of the code structure based on Abstract Syntax Trees to selectively identify web tracking functionality shared across multiple web services. This new methodology allows us to: (i) effectively detect web tracking code even when using evasion techniques (e.g., obfuscation, minification, or webpackaging); and (ii) safely remove those portions of code related to tracking purposes without affecting the legitimate functionality of the website. Our evaluation with the top 10k most popular Internet domains shows that ASTrack can detect web tracking with high precision (98%), while discovering about 50k tracking code pieces and more than 3,400 new tracking URLs not previously recognized by most popular privacy-preserving tools (e.g., uBlock Origin). Moreover, ASTrack achieved a 36% reduction in functionality loss in comparison with the filter lists, one of the safest options available. Using a novel methodology that combines computer vision and manual inspection, we estimate that full functionality is preserved in more than 97% of the websites.
△ Less
Submitted 25 January, 2023;
originally announced January 2023.
-
Demystifying content-blockers: A large scale study of actual performance gains
Authors:
Ismael Castell-Uroz,
Josep Solé-Pareta,
Pere Barlet-Ros
Abstract:
With the evolution of the online advertisement and tracking ecosystem, content-filtering has become the reference tool for improving the security, privacy and browsing experience when surfing the Internet. It is also commonly believed that using content-blockers to stop unsolicited content decreases the time needed for loading websites. In this work, we perform a large scale study with the 100K mo…
▽ More
With the evolution of the online advertisement and tracking ecosystem, content-filtering has become the reference tool for improving the security, privacy and browsing experience when surfing the Internet. It is also commonly believed that using content-blockers to stop unsolicited content decreases the time needed for loading websites. In this work, we perform a large scale study with the 100K most popular websites on the actual performance improvements of using content-blockers. We focus our study in the two most relevant metrics for user experience; bandwidth and latency. Our results show that using such tools results in small improvements in terms of bandwidth usage but, contrary to popular belief, it has a negligible impact in terms of loading time. We also find that, in the case of small and fast loading websites, the use of content-blockers can even result in increased browsing latency.
△ Less
Submitted 5 February, 2020;
originally announced February 2020.