-
Technical Report: Generating the WEB-IDS23 Dataset
Authors:
Eric Lanfer,
Dominik Brockmann,
Nils Aschenbruck
Abstract:
Anomaly-based Network Intrusion Detection Systems (NIDS) require correctly labelled, representative and diverse datasets for an accurate evaluation and development. However, several widely used datasets do not include labels which are fine-grained enough and, together with small sample sizes, can lead to overfitting issues that also remain undetected when using test data. Additionally, the cyberse…
▽ More
Anomaly-based Network Intrusion Detection Systems (NIDS) require correctly labelled, representative and diverse datasets for an accurate evaluation and development. However, several widely used datasets do not include labels which are fine-grained enough and, together with small sample sizes, can lead to overfitting issues that also remain undetected when using test data. Additionally, the cybersecurity sector is evolving fast, and new attack mechanisms require the continuous creation of up-to-date datasets. To address these limitations, we developed a modular traffic generator that can simulate a wide variety of benign and malicious traffic. It incorporates multiple protocols, variability through randomization techniques and can produce attacks along corresponding benign traffic, as it occurs in real-world scenarios. Using the traffic generator, we create a dataset capturing over 12 million samples with 82 flow-level features and 21 fine-grained labels. Additionally, we include several web attack types which are often underrepresented in other datasets.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
Mens Sana In Corpore Sano: Sound Firmware Corpora for Vulnerability Research
Authors:
René Helmke,
Elmar Padilla,
Nils Aschenbruck
Abstract:
Firmware corpora for vulnerability research should be scientifically sound. Yet, several practical challenges complicate the creation of sound corpora: Sample acquisition, e.g., is hard and one must overcome the barrier of proprietary or encrypted data. As image contents are unknown prior analysis, it is hard to select high-quality samples that can satisfy scientific demands. Ideally, we help each…
▽ More
Firmware corpora for vulnerability research should be scientifically sound. Yet, several practical challenges complicate the creation of sound corpora: Sample acquisition, e.g., is hard and one must overcome the barrier of proprietary or encrypted data. As image contents are unknown prior analysis, it is hard to select high-quality samples that can satisfy scientific demands. Ideally, we help each other out by sharing data. But here, sharing is problematic due to copyright laws. Instead, papers must carefully document each step of corpus creation: If a step is unclear, replicability is jeopardized. This has cascading effects on result verifiability, representativeness, and, thus, soundness.
Despite all challenges, how can we maintain the soundness of firmware corpora? This paper thoroughly analyzes the problem space and investigates its impact on research: We distill practical binary analysis challenges that significantly influence corpus creation. We use these insights to derive guidelines that help researchers to nurture corpus replicability and representativeness. We apply them to 44 top tier papers and systematically analyze scientific corpus creation practices. Our comprehensive analysis confirms that there is currently no common ground in related work. It shows the added value of our guidelines, as they discover methodical issues in corpus creation and unveil miniscule step stones in documentation. These blur visions on representativeness, hinder replicability, and, thus, negatively impact the soundness of otherwise excellent work.
Finally, we show the feasibility of our guidelines and build a new, replicable corpus for large-scale analyses on Linux firmware: LFwC. We share rich meta data for good (and proven) replicability. We verify unpacking, deduplicate, identify contents, provide ground truth, and show LFwC's utility for research.
△ Less
Submitted 21 November, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Starlink on the Road: A First Look at Mobile Starlink Performance in Central Europe
Authors:
Dominic Laniewski,
Eric Lanfer,
Simon Beginn,
Jan Dunker,
Michael Dückers,
Nils Aschenbruck
Abstract:
Low Earth Orbit Satellite Networks such as Starlink promise to provide world-wide Internet access. While traditionally designed for stationary use, a new dish, released in April 2023 in Europe, provides mobile Internet access including in-motion usage, e.g., while mounted on a car. In this paper, we design and build a mobile measurement setup. Our goal is to fully autonomously conduct continuous S…
▽ More
Low Earth Orbit Satellite Networks such as Starlink promise to provide world-wide Internet access. While traditionally designed for stationary use, a new dish, released in April 2023 in Europe, provides mobile Internet access including in-motion usage, e.g., while mounted on a car. In this paper, we design and build a mobile measurement setup. Our goal is to fully autonomously conduct continuous Starlink measurements while the car is in motion. We share our practical experiences, including challenges regarding the permanent power supply. We measure the Starlink performance over the span of two months from mid-January to mid-March 2024 when the car is in motion. The measurements consist of all relevant network parameters, such as the download and upload throughput, the RTT, and packet loss, as well as detailed power consumption data. We analyze our dataset to assess Starlink's mobile performance in Central Europe, Germany, and compare it to stationary measurements in proximity. We find that the mobile performance is significantly worse than stationary performance. The power consumption of the new dish is higher, but seems to be more correlated to the heating function of the dish than to the speed of the vehicle.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
WetLinks: a Large-Scale Longitudinal Starlink Dataset with Contiguous Weather Data
Authors:
Dominic Laniewski,
Eric Lanfer,
Bernd Meijerink,
Roland van Rijswijk-Deij,
Nils Aschenbruck
Abstract:
Low Orbit Satellite (LEO) networks such as Starlink promise Internet access everywhere around the world. In this paper, we present WetLinks - a large and publicly available trace-based dataset of Starlink measurements. The measurements were concurrently collected from two European vantage points over a span of six months. Consisting of approximately 140,000 measurements, the dataset comprises all…
▽ More
Low Orbit Satellite (LEO) networks such as Starlink promise Internet access everywhere around the world. In this paper, we present WetLinks - a large and publicly available trace-based dataset of Starlink measurements. The measurements were concurrently collected from two European vantage points over a span of six months. Consisting of approximately 140,000 measurements, the dataset comprises all relevant network parameters such as the upload and download throughputs, the RTT, packet loss, and traceroutes. We further augment the dataset with concurrent data from professional weather stations placed next to both Starlink terminals. Based on our dataset, we analyse Starlink performance, including its susceptibility to weather conditions. We use this to validate our dataset by replicating the results of earlier smaller-scale studies. We release our datasets and all accompanying tooling as open data. To the best of our knowledge, ours is the largest Starlink dataset to date.
△ Less
Submitted 13 March, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Preprocess your Paths -- Speeding up Linear Programming-based Optimization for Segment Routing Traffic Engineering
Authors:
Alexander Brundiers,
Timmy Schüller,
Nils Aschenbruck
Abstract:
Many state-of-the-art Segment Routing (SR) Traffic Engineering (TE) algorithms rely on Linear Program (LP)-based optimization. However, the poor scalability of the latter and the resulting high computation times impose severe restrictions on the practical usability of such approaches for many use cases. To tackle this problem, a variety of preprocessing approaches have been proposed that aim to re…
▽ More
Many state-of-the-art Segment Routing (SR) Traffic Engineering (TE) algorithms rely on Linear Program (LP)-based optimization. However, the poor scalability of the latter and the resulting high computation times impose severe restrictions on the practical usability of such approaches for many use cases. To tackle this problem, a variety of preprocessing approaches have been proposed that aim to reduce computational complexity by preemtively limiting the number of SR paths to consider during optimization. In this paper, we provide the first extensive literature review of existing preprocessing approaches for SR. Based on this, we conduct a large scale comparative study using various real-world topologies, including recent data from a Tier-1 Internet Service Provider (ISP) backbone. Based on the insights obtained from this evaluation, we finally propose a combination of multiple preprocessing approaches and show that this can reliably reduce computation times by around a factor of 10 or more, without resulting in relevant deterioration of the solution quality. This is a major improvement over the current state-of-the-art and facilitates the reliable usability of LP-based optimization for large segment-routed networks.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Green Traffic Engineering by Line Card Minimization
Authors:
Daniel Otten,
Max Ilsen,
Markus Chimani,
Nils Aschenbruck
Abstract:
Green Traffic Engineering encompasses network design and traffic routing strategies that aim at reducing the power consumption of a backbone network. We argue that turning off linecards is the most effective approach to reach this goal. Thus, we investigate the problem of minimizing the number of active line cards in a network while simultaneously allowing a multi-commodity flow being routed and k…
▽ More
Green Traffic Engineering encompasses network design and traffic routing strategies that aim at reducing the power consumption of a backbone network. We argue that turning off linecards is the most effective approach to reach this goal. Thus, we investigate the problem of minimizing the number of active line cards in a network while simultaneously allowing a multi-commodity flow being routed and keeping the maximum link utilization below a certain threshold. In addition to proving this problem to be NP-hard, we present an optimal ILP-based algorithm as well as a heuristic based on 2-Segment Routing. Lastly, we evaluate both approaches on real-world networks obtained from the Repetita Framework and a globally operating Internet Service Provider. The results of this evaluation indicate that our heuristic is not only close to optimal but significantly faster than the optimal algorithm, making it viable in practice.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Green Segment Routing for Improved Sustainability of Backbone Networks
Authors:
Daniel Otten,
Alexander Brundiers,
Timmy Schüller,
Nils Aschenbruck
Abstract:
Improving the energy efficiency of Internet Service Provider (ISP) backbone networks is an important objective for ISP operators. In these networks, the overall traffic load throughout the day can vary drastically, resulting in many backbone networks being highly overprovisioned during periods of lower traffic volume. In this paper, we propose a new Segment Routing (SR)-based optimization algorith…
▽ More
Improving the energy efficiency of Internet Service Provider (ISP) backbone networks is an important objective for ISP operators. In these networks, the overall traffic load throughout the day can vary drastically, resulting in many backbone networks being highly overprovisioned during periods of lower traffic volume. In this paper, we propose a new Segment Routing (SR)-based optimization algorithm that aims at reducing the energy consumption of networks during such low-traffic periods. It uses the traffic steering capabilities of SR to remove traffic from as many links as possible to allow the respective hardware components to be switched off. Furthermore, it simultaneously ensures that solutions comply to additional operator requirements regarding the overall Maximum Link Utilization in the network. Based on data from a Tier-1 ISP and a public available dataset, we show that our approach allows for up to 70 % of the overall linecards to be switched off, corresponding to an around 56% reduction of the overall energy consumption of the network in times of low traffic demands.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Improving Proximity Classification for Contact Tracing using a Multi-channel Approach
Authors:
Eric Lanfer,
Thomas Hänel,
Roland van Rijswijk-Deij,
Nils Aschenbruck
Abstract:
Due to the COVID 19 pandemic, smartphone-based proximity tracing systems became of utmost interest. Many of these systems use BLE signals to estimate the distance between two persons. The quality of this method depends on many factors and, therefore, does not always deliver accurate results. In this paper, we present a multi-channel approach to improve proximity classification, and a novel, public…
▽ More
Due to the COVID 19 pandemic, smartphone-based proximity tracing systems became of utmost interest. Many of these systems use BLE signals to estimate the distance between two persons. The quality of this method depends on many factors and, therefore, does not always deliver accurate results. In this paper, we present a multi-channel approach to improve proximity classification, and a novel, publicly available data set that contains matched IEEE 802.11 (2.4 GHz and 5 GHz) and BLE signal strength data, measured in four different environments. We have developed and evaluated a combined classification model based on BLE and IEEE 802.11 signals. Our approach significantly improves the distance classification and consequently also the contact tracing accuracy. We are able to achieve good results with our approach in everyday public transport scenarios. However, in our implementation based on IEEE 802.11 probe requests, we also encountered privacy problems and limitations due to the consistency and interval at which such probes are sent. We discuss these limitations and sketch how our approach could be improved to make it suitable for real-world deployment.
△ Less
Submitted 20 April, 2022; v1 submitted 25 January, 2022;
originally announced January 2022.