-
The Multifractal IP Address Structure: Physical Explanation and Implications
Authors:
Chris Misa,
Ram Durairajan,
Arpit Gupta,
Reza Rejaie,
Walter Willinger
Abstract:
The structure of IP addresses observed in Internet traffic plays a critical role for a wide range of networking problems of current interest. For example, modern network telemetry systems that take advantage of existing data plane technologies for line rate traffic monitoring and processing cannot afford to waste precious data plane resources on traffic that comes from "uninteresting" regions of t…
▽ More
The structure of IP addresses observed in Internet traffic plays a critical role for a wide range of networking problems of current interest. For example, modern network telemetry systems that take advantage of existing data plane technologies for line rate traffic monitoring and processing cannot afford to waste precious data plane resources on traffic that comes from "uninteresting" regions of the IP address space. However, there is currently no well-established structural model or analysis toolbox that enables a first-principles approach to the specific problem of identifying "uninteresting" regions of the address space or the myriad of other networking problems that prominently feature IP addresses.
To address this key missing piece, we present in this paper a first-of-its-kind empirically validated physical explanation for why the observed IP address structure in measured Internet traffic is multifractal in nature. Our root cause analysis overcomes key limitations of mostly forgotten findings from ~20 years ago and demonstrates that the Internet processes and mechanisms responsible for how IP addresses are allocated, assigned, and used in today's Internet are consistent with and well modeled by a class of evocative mathematical models called conservative cascades. We complement this root cause analysis with the development of an improved toolbox that is tailor-made for analyzing finite and discrete sets of IP addresses and includes statistical estimators that engender high confidence in the inferences they produce. We illustrate the use of this toolbox in the context of a novel address structure anomaly detection method we designed and conclude with a discussion of a range of challenging open networking problems that are motivated or inspired by our findings.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
In Search of netUnicorn: A Data-Collection Platform to Develop Generalizable ML Models for Network Security Problems
Authors:
Roman Beltiukov,
Wenbo Guo,
Arpit Gupta,
Walter Willinger
Abstract:
The remarkable success of the use of machine learning-based solutions for network security problems has been impeded by the developed ML models' inability to maintain efficacy when used in different network environments exhibiting different network behaviors. This issue is commonly referred to as the generalizability problem of ML models. The community has recognized the critical role that trainin…
▽ More
The remarkable success of the use of machine learning-based solutions for network security problems has been impeded by the developed ML models' inability to maintain efficacy when used in different network environments exhibiting different network behaviors. This issue is commonly referred to as the generalizability problem of ML models. The community has recognized the critical role that training datasets play in this context and has developed various techniques to improve dataset curation to overcome this problem. Unfortunately, these methods are generally ill-suited or even counterproductive in the network security domain, where they often result in unrealistic or poor-quality datasets.
To address this issue, we propose an augmented ML pipeline that leverages explainable ML tools to guide the network data collection in an iterative fashion. To ensure the data's realism and quality, we require that the new datasets should be endogenously collected in this iterative process, thus advocating for a gradual removal of data-related problems to improve model generalizability. To realize this capability, we develop a data-collection platform, netUnicorn, that takes inspiration from the classic "hourglass" model and is implemented as its "thin waist" to simplify data collection for different learning problems from diverse network environments. The proposed system decouples data-collection intents from the deployment mechanisms and disaggregates these high-level intents into smaller reusable, self-contained tasks.
We demonstrate how netUnicorn simplifies collecting data for different learning problems from multiple network environments and how the proposed iterative data collection improves a model's generalizability.
△ Less
Submitted 10 September, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
DynamiQ: Planning for Dynamics in Network Streaming Analytics Systems
Authors:
Rohan Bhatia,
Arpit Gupta,
Rob Harrison,
Daniel Lokshtanov,
Walter Willinger
Abstract:
The emergence of programmable data-plane targets has motivated a new hybrid design for network streaming analytics systems that combine these targets' fast packet processing speeds with the rich compute resources available at modern stream processors. However, these systems require careful query planning; that is, specifying the minute details of executing a given set of queries in a way that make…
▽ More
The emergence of programmable data-plane targets has motivated a new hybrid design for network streaming analytics systems that combine these targets' fast packet processing speeds with the rich compute resources available at modern stream processors. However, these systems require careful query planning; that is, specifying the minute details of executing a given set of queries in a way that makes the best use of the limited resources and programmability offered by data-plane targets. We use such an existing system, Sonata, and real-world packet traces to understand how executing a fixed query workload is affected by the unknown dynamics of the traffic that defines the target's input workload. We observe that static query planning, as employed by Sonata, cannot handle even small changes in the input workload, wasting data-plane resources to the point where query execution is confined mainly to userspace.
This paper presents the design and implementation of DynamiQ, a new network streaming analytics platform that employs dynamic query planning to deal with the dynamics of real-world input workloads. Specifically, we develop a suite of practical algorithms for (i) computing effective initial query plans (to start query execution) and (ii) enabling efficient updating of portions of such an initial query plan at runtime (to adapt to changes in the input workload). Using real-world packet traces as input workload, we show that compared to Sonata, DynamiQ reduces the stream processor's workload by two orders of magnitude.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
GreyFiber: A System for Providing Flexible Access to Wide-Area Connectivity
Authors:
Ramakrishnan Durairajan,
Paul Barford,
Joel Sommers,
Walter Willinger
Abstract:
Access to fiber-optic connectivity in the Internet is traditionally offered either via lit circuits or dark fiber. Economic (capex vs. opex) and operational considerations (latency, capacity) dictate the choice between these two offerings, but neither may effectively address the specific needs of modern-day enterprises or service providers over a range of use scenarios. In this paper, we describe…
▽ More
Access to fiber-optic connectivity in the Internet is traditionally offered either via lit circuits or dark fiber. Economic (capex vs. opex) and operational considerations (latency, capacity) dictate the choice between these two offerings, but neither may effectively address the specific needs of modern-day enterprises or service providers over a range of use scenarios. In this paper, we describe a new approach for fiber-optic connectivity in the Internet that we call GreyFiber. The core idea of GreyFiber is to offer flexible access to fiber-optic paths between end points (e.g., datacenters or colocation facilities) over a range of timescales. We identify and discuss operational issues and systems challenges that need to be addressed to make GreyFiber a viable and realistic option for offering flexible access to infrastructure (similar to cloud computing). We investigate the efficacy of GreyFiber with a prototype implementation deployed in the GENI and CloudLab testbeds. Our scaling experiments show that 50 circuits can be provisioned within a minute. We also show that backup paths can be provisioned 28 times faster than an OSPF-based solution during failure/maintenance events. Our experiments also examine GreyFiber overhead demands and show that the time spent in circuit creation is dependent on the network infrastructure, indicating avenues for future improvements.
△ Less
Submitted 13 July, 2018;
originally announced July 2018.
-
Sonata: Query-Driven Network Telemetry
Authors:
Arpit Gupta,
Rob Harrison,
Ankita Pawar,
RĂ¼diger Birkner,
Marco Canini,
Nick Feamster,
Jennifer Rexford,
Walter Willinger
Abstract:
Operating networks depends on collecting and analyzing measurement data. Current technologies do not make it easy to do so, typically because they separate data collection (e.g., packet capture or flow monitoring) from analysis, producing either too much data to answer a general question or too little data to answer a detailed question. In this paper, we present Sonata, a network telemetry system…
▽ More
Operating networks depends on collecting and analyzing measurement data. Current technologies do not make it easy to do so, typically because they separate data collection (e.g., packet capture or flow monitoring) from analysis, producing either too much data to answer a general question or too little data to answer a detailed question. In this paper, we present Sonata, a network telemetry system that uses a uniform query interface to drive the joint collection and analysis of network traffic. Sonata takes the advantage of two emerging technologies---streaming analytics platforms and programmable network devices---to facilitate joint collection and analysis. Sonata allows operators to more directly express network traffic analysis tasks in terms of a high-level language. The underlying runtime partitions each query into a portion that runs on the switch and another that runs on the streaming analytics platform iteratively refines the query to efficiently capture only the traffic that pertains to the operator's query, and exploits sketches to reduce state in switches in exchange for more approximate results. Through an evaluation of a prototype implementation, we demonstrate that Sonata can support a wide range of network telemetry tasks with less state in the network, and lower data rates to streaming analytics systems, than current approaches can achieve.
△ Less
Submitted 2 May, 2017;
originally announced May 2017.
-
The Workshop on Internet Topology (WIT) Report
Authors:
Dmitri Krioukov,
Fan Chung,
kc claffy,
Marina Fomenkov,
Alessandro Vespignani,
Walter Willinger
Abstract:
Internet topology analysis has recently experienced a surge of interest in computer science, physics, and the mathematical sciences. However, researchers from these different disciplines tend to approach the same problem from different angles. As a result, the field of Internet topology analysis and modeling must untangle sets of inconsistent findings, conflicting claims, and contradicting state…
▽ More
Internet topology analysis has recently experienced a surge of interest in computer science, physics, and the mathematical sciences. However, researchers from these different disciplines tend to approach the same problem from different angles. As a result, the field of Internet topology analysis and modeling must untangle sets of inconsistent findings, conflicting claims, and contradicting statements.
On May 10-12, 2006, CAIDA hosted the Workshop on Internet topology (WIT). By bringing together a group of researchers spanning the areas of computer science, physics, and the mathematical sciences, the workshop aimed to improve communication across these scientific disciplines, enable interdisciplinary crossfertilization, identify commonalities in the different approaches, promote synergy where it exists, and utilize the richness that results from exploring similar problems from multiple perspectives.
This report describes the findings of the workshop, outlines a set of relevant open research problems identified by participants, and concludes with recommendations that can benefit all scientific communities interested in Internet topology research.
△ Less
Submitted 7 December, 2006;
originally announced December 2006.
-
Towards a Theory of Scale-Free Graphs: Definition, Properties, and Implications (Extended Version)
Authors:
Lun Li,
David Alderson,
Reiko Tanaka,
John C. Doyle,
Walter Willinger
Abstract:
Although the ``scale-free'' literature is large and growing, it gives neither a precise definition of scale-free graphs nor rigorous proofs of many of their claimed properties. In fact, it is easily shown that the existing theory has many inherent contradictions and verifiably false claims. In this paper, we propose a new, mathematically precise, and structural definition of the extent to which…
▽ More
Although the ``scale-free'' literature is large and growing, it gives neither a precise definition of scale-free graphs nor rigorous proofs of many of their claimed properties. In fact, it is easily shown that the existing theory has many inherent contradictions and verifiably false claims. In this paper, we propose a new, mathematically precise, and structural definition of the extent to which a graph is scale-free, and prove a series of results that recover many of the claimed properties while suggesting the potential for a rich and interesting theory. With this definition, scale-free (or its opposite, scale-rich) is closely related to other structural graph properties such as various notions of self-similarity (or respectively, self-dissimilarity). Scale-free graphs are also shown to be the likely outcome of random construction processes, consistent with the heuristic definitions implicit in existing random graph approaches. Our approach clarifies much of the confusion surrounding the sensational qualitative claims in the scale-free literature, and offers rigorous and quantitative alternatives.
△ Less
Submitted 18 October, 2005; v1 submitted 8 January, 2005;
originally announced January 2005.