-
IRR-Based AS Type of Relationship Inference
Authors:
Amit Zulan,
Omer Miron,
Tal Shapira,
Yuval Shavitt
Abstract:
The Internet comprises tens of thousands of autonomous systems (ASes) whose commercial relationships are not publicly announced. The classification of the Type of Relationship (ToR) between ASes has been extensively studied over the past two decades due to its relevance in network routing management and security.
This paper presents a new approach to ToR classification, leveraging publicly avail…
▽ More
The Internet comprises tens of thousands of autonomous systems (ASes) whose commercial relationships are not publicly announced. The classification of the Type of Relationship (ToR) between ASes has been extensively studied over the past two decades due to its relevance in network routing management and security.
This paper presents a new approach to ToR classification, leveraging publicly available BGP data from the Internet Routing Registry (IRR). We show how the IRR can be mined and the results refined to achieve a large and accurate ToR database. Using a ground truth database with hundreds of entries we show that we indeed manage to obtain high accuracy. About two-thirds of our ToRs are new, namely, they were not obtained by previous works, which means that we enrich our ToR knowledge with links that are otherwise missed.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
Measuring DNS Censorship of Generative AI Platforms
Authors:
Harel Berger,
Yuval Shavitt
Abstract:
Generative AI is an invaluable tool, however, in some parts of the world, this technology is censored due to political or societal issues. In this work, we monitor Generative AI censorship through the DNS protocol. We find China to be a leading country of Generative AI censorship. Interestingly, China does not censor all AI domain names. We also report censorship in Russia and find inconsistencies…
▽ More
Generative AI is an invaluable tool, however, in some parts of the world, this technology is censored due to political or societal issues. In this work, we monitor Generative AI censorship through the DNS protocol. We find China to be a leading country of Generative AI censorship. Interestingly, China does not censor all AI domain names. We also report censorship in Russia and find inconsistencies in their process. We compare our results to other measurement platforms (OONI, Censored Planet, GFWatch), and present their lack of data on Generative AI domains.
△ Less
Submitted 18 December, 2024;
originally announced December 2024.
-
BGP Typo: A Longitudinal Study and Remedies
Authors:
Liron David,
Yuval Shavitt
Abstract:
BGP is the protocol that keeps Internet connected. Operators use it by announcing Address Prefixes (APs), namely IP address blocks, that they own or that they agree to serve as transit for. BGP enables ISPs to devise complex policies to control what AP announcements to accept (import policy), the route selection, and what AP to announce and to whom (export policy). In addition, BGP is also used to…
▽ More
BGP is the protocol that keeps Internet connected. Operators use it by announcing Address Prefixes (APs), namely IP address blocks, that they own or that they agree to serve as transit for. BGP enables ISPs to devise complex policies to control what AP announcements to accept (import policy), the route selection, and what AP to announce and to whom (export policy). In addition, BGP is also used to coarse traffic engineering for incoming traffic via the prepend mechanism.
However, there are no wide-spread good tools for managing BGP and much of the complex configuration is done by home-brewed scripts or simply by manually configuring router with bare-bone terminal interface. This process generates many configuration mistakes.
In this study, we examine typos that propagates in BGP announcements and can be found in many of the public databases. We classify them and quantify their presence, and surprisingly found tens of ASNs and hundreds of APs affected by typos on any given time. In addition, we suggest a simple algorithm that can detect (and clean) most of them with almost no false positives.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Detecting Security Patches via Behavioral Data in Code Repositories
Authors:
Nitzan Farhi,
Noam Koenigstein,
Yuval Shavitt
Abstract:
The absolute majority of software today is developed collaboratively using collaborative version control tools such as Git. It is a common practice that once a vulnerability is detected and fixed, the developers behind the software issue a Common Vulnerabilities and Exposures or CVE record to alert the user community of the security hazard and urge them to integrate the security patch. However, so…
▽ More
The absolute majority of software today is developed collaboratively using collaborative version control tools such as Git. It is a common practice that once a vulnerability is detected and fixed, the developers behind the software issue a Common Vulnerabilities and Exposures or CVE record to alert the user community of the security hazard and urge them to integrate the security patch. However, some companies might not disclose their vulnerabilities and just update their repository. As a result, users are unaware of the vulnerability and may remain exposed. In this paper, we present a system to automatically identify security patches using only the developer behavior in the Git repository without analyzing the code itself or the remarks that accompanied the fix (commit message). We showed we can reveal concealed security patches with an accuracy of 88.3% and F1 Score of 89.8%. This is the first time that a language-oblivious solution for this problem is presented.
△ Less
Submitted 4 February, 2023;
originally announced February 2023.
-
Setting the Foundations for PoP-Based Internet Evolution Models
Authors:
Noa Zilberman,
Yuval Shavitt
Abstract:
Developing an evolution model of the Internet has been a long standing research challenge. Such a model can improve the design and placement of communication infrastructure, reducing costs and improving users' quality of experience. While communication infrastructure is tightly coupled to geographical locations, Internet modelling and forecasting in the last decade used network elements that are o…
▽ More
Developing an evolution model of the Internet has been a long standing research challenge. Such a model can improve the design and placement of communication infrastructure, reducing costs and improving users' quality of experience. While communication infrastructure is tightly coupled to geographical locations, Internet modelling and forecasting in the last decade used network elements that are only loosely bounded to any geographical location. In this paper we set the foundations for developing an evolution model of the Internet based on the Point of Presence (PoP) level. As PoPs have a strong geographical grip they can better represent the evolution of the Internet. We annotate the PoP topologies of the Internet with geographical, economic and demographic information to achieve an understanding of the dynamics of the Internet's structure, in order to identify the constitutive laws of Internet evolution. We identify GDP as the strongest indicator on the country level, and the size of the TV market as the strongest indicator on the US metropolitan level. Finally, we draw attention to the limitations of developing a world-wide evolution model.
△ Less
Submitted 14 December, 2016; v1 submitted 13 December, 2016;
originally announced December 2016.
-
Optimizing Dijkstra for real-world performance
Authors:
Nimrod Aviram,
Yuval Shavitt
Abstract:
Using Dijkstra's algorithm to compute the shortest paths in a graph from a single source node to all other nodes is common practice in industry and academia. Although the original description of the algorithm advises using a Fibonacci Heap as its internal queue, it has been noted that in practice, a binary (or $d$-ary) heap implementation is significantly faster. This paper introduces an even fast…
▽ More
Using Dijkstra's algorithm to compute the shortest paths in a graph from a single source node to all other nodes is common practice in industry and academia. Although the original description of the algorithm advises using a Fibonacci Heap as its internal queue, it has been noted that in practice, a binary (or $d$-ary) heap implementation is significantly faster. This paper introduces an even faster queue design for the algorithm.
Our experimental results currently put our prototype implementation at about twice as fast as the Boost implementation of the algorithm on both real-world and generated large graphs. Furthermore, this preliminary implementation was written in only a few weeks, by a single programmer. The fact that such an early prototype compares favorably against Boost, a well-known open source library developed by expert programmers, gives us reason to believe our design for the queue is indeed better suited to the problem at hand, and the favorable time measurements are not a product of any specific implementation technique we employed.
△ Less
Submitted 19 May, 2015;
originally announced May 2015.
-
Stochastic Service Placement
Authors:
Galia Shabtai,
Danny Raz,
Yuval Shavitt
Abstract:
Resource allocation for cloud services is a complex task due to the diversity of the services and the dynamic workloads. One way to address this is by overprovisioning which results in high cost due to the unutilized resources. A much more economical approach, relying on the stochastic nature of the demand, is to allocate just the right amount of resources and use additional more expensive mechani…
▽ More
Resource allocation for cloud services is a complex task due to the diversity of the services and the dynamic workloads. One way to address this is by overprovisioning which results in high cost due to the unutilized resources. A much more economical approach, relying on the stochastic nature of the demand, is to allocate just the right amount of resources and use additional more expensive mechanisms in case of overflow situations where demand exceeds the capacity. In this paper we study this approach and show both by comprehensive analysis for independent normal distributed demands and simulation on synthetic data that it is significantly better than currently deployed methods.
△ Less
Submitted 9 March, 2015;
originally announced March 2015.
-
The Role of Trends in Evolving Networks
Authors:
Osnat Mokryn,
Marcel Blattner,
Yuval Shavitt
Abstract:
Modeling complex networks has been the focus of much research for over a decade. Preferential attachment (PA) is considered a common explanation to the self organization of evolving networks, suggesting that new nodes prefer to attach to more popular nodes. The PA model results in broad degree distributions, found in many networks, but cannot explain other common properties such as: The growth of…
▽ More
Modeling complex networks has been the focus of much research for over a decade. Preferential attachment (PA) is considered a common explanation to the self organization of evolving networks, suggesting that new nodes prefer to attach to more popular nodes. The PA model results in broad degree distributions, found in many networks, but cannot explain other common properties such as: The growth of nodes arriving late and Clustering (community structure). Here we show that when the tendency of networks to adhere to trends is incorporated into the PA model, it can produce networks with such properties. Namely, in trending networks, newly arriving nodes may become central at random, forming new clusters. In particular, we show that when the network is young it is more susceptible to trends, but even older networks may have trendy new nodes that become central in their structure. Alternatively, networks can be seen as composed of two parts: static, governed by a power law degree distribution, and a dynamic part governed by trends, as we show on Wiki pages. Our results also show that the arrival of trending new nodes not only creates new clusters, but also has an effect on the relative importance and centrality of all other nodes in the network. This can explain a variety of real world networks in economics, social and online networks, and cultural networks. Products popularity, formed by the network of people's opinions, exhibit these properties. Some lines of products are increasingly susceptible to trends and hence to shifts in popularity, while others are less trendy and hence more stable. We believe that our findings have a big impact on our understanding of real networks.
△ Less
Submitted 4 June, 2013;
originally announced June 2013.
-
Topological Trends of Internet Content Providers
Authors:
Yuval Shavitt,
Udi Weinsberg
Abstract:
The Internet is constantly changing, and its hierarchy was recently shown to become flatter. Recent studies of inter-domain traffic showed that large content providers drive this change by bypassing tier-1 networks and reaching closer to their users, enabling them to save transit costs and reduce reliance of transit networks as new services are being deployed, and traffic shaping is becoming incre…
▽ More
The Internet is constantly changing, and its hierarchy was recently shown to become flatter. Recent studies of inter-domain traffic showed that large content providers drive this change by bypassing tier-1 networks and reaching closer to their users, enabling them to save transit costs and reduce reliance of transit networks as new services are being deployed, and traffic shaping is becoming increasingly popular.
In this paper we take a first look at the evolving connectivity of large content provider networks, from a topological point of view of the autonomous systems (AS) graph. We perform a 5-year longitudinal study of the topological trends of large content providers, by analyzing several large content providers and comparing these trends to those observed for large tier-1 networks. We study trends in the connectivity of the networks, neighbor diversity and geographical spread, their hierarchy, the adoption of IXPs as a convenient method for peering, and their centrality. Our observations indicate that content providers gradually increase and diversify their connectivity, enabling them to improve their centrality in the graph, and as a result, tier-1 networks lose dominance over time.
△ Less
Submitted 4 January, 2012;
originally announced January 2012.
-
On the Dynamics of IP Address Allocation and Availability of End-Hosts
Authors:
Oded Argon,
Anat Bremler-Barr,
Osnat Mokryn,
Dvir Schirman,
Yuval Shavitt,
Udi Weinsberg
Abstract:
The availability of end-hosts and their assigned routable IP addresses has impact on the ability to fight spammers and attackers, and on peer-to-peer application performance. Previous works study the availability of hosts mostly by using either active pinging or by studying access to a mail service, both approaches suffer from inherent inaccuracies. We take a different approach by measuring the IP…
▽ More
The availability of end-hosts and their assigned routable IP addresses has impact on the ability to fight spammers and attackers, and on peer-to-peer application performance. Previous works study the availability of hosts mostly by using either active pinging or by studying access to a mail service, both approaches suffer from inherent inaccuracies. We take a different approach by measuring the IP addresses periodically reported by a uniquely identified group of the hosts running the DIMES agent. This fresh approach provides a chance to measure the true availability of end-hosts and the dynamics of their assigned routable IP addresses. Using a two month study of 1804 hosts, we find that over 60% of the hosts have a fixed IP address and 90% median availability, while some of the remaining hosts have more than 30 different IPs. For those that have periodically changing IP addresses, we find that the median average period per AS is roughly 24 hours, with a strong relation between the offline time and the probability of altering IP address.
△ Less
Submitted 10 November, 2010;
originally announced November 2010.
-
A Study of Geolocation Databases
Authors:
Yuval Shavitt,
Noa Zilberman
Abstract:
The geographical location of Internet IP addresses has an importance both for academic research and commercial applications. Thus, both commercial and academic databases and tools are available for mapping IP addresses to geographic locations. Evaluating the accuracy of these mapping services is complex since obtaining diverse large scale ground truth is very hard. In this work we evaluate mapping…
▽ More
The geographical location of Internet IP addresses has an importance both for academic research and commercial applications. Thus, both commercial and academic databases and tools are available for mapping IP addresses to geographic locations. Evaluating the accuracy of these mapping services is complex since obtaining diverse large scale ground truth is very hard. In this work we evaluate mapping services using an algorithm that groups IP addresses to PoPs, based on structure and delay. This way we are able to group close to 100,000 IP addresses world wide into groups that are known to share a geo-location with high confidence. We provide insight into the strength and weaknesses of IP geolocation databases, and discuss their accuracy and encountered anomalies.
△ Less
Submitted 1 July, 2010; v1 submitted 31 May, 2010;
originally announced May 2010.
-
Approximating the Statistics of various Properties in Randomly Weighted Graphs
Authors:
Yuval Emek,
Amos Korman,
Yuval Shavitt
Abstract:
Consider the setting of \emph{randomly weighted graphs}, namely, graphs whose edge weights are chosen independently according to probability distributions with finite support over the non-negative reals. Under this setting, properties of weighted graphs typically become random variables and we are interested in computing their statistical features. Unfortunately, this turns out to be computational…
▽ More
Consider the setting of \emph{randomly weighted graphs}, namely, graphs whose edge weights are chosen independently according to probability distributions with finite support over the non-negative reals. Under this setting, properties of weighted graphs typically become random variables and we are interested in computing their statistical features. Unfortunately, this turns out to be computationally hard for some properties albeit the problem of computing them in the traditional setting of algorithmic graph theory is tractable. For example, there are well known efficient algorithms that compute the \emph{diameter} of a given weighted graph, yet, computing the \emph{expected} diameter of a given randomly weighted graph is \SharpP{}-hard even if the edge weights are identically distributed. In this paper, we define a family of properties of weighted graphs and show that for each property in this family, the problem of computing the \emph{$k^{\text{th}}$ moment} (and in particular, the expected value) of the corresponding random variable in a given randomly weighted graph $G$ admits a \emph{fully polynomial time randomized approximation scheme (FPRAS)} for every fixed $k$. This family includes fundamental properties of weighted graphs such as the diameter of $G$, the \emph{radius} of $G$ (with respect to any designated vertex) and the weight of a \emph{minimum spanning tree} of $G$.
△ Less
Submitted 28 March, 2010; v1 submitted 7 August, 2009;
originally announced August 2009.
-
An $O(\log n)$-approximation for the Set Cover Problem with Set Ownership
Authors:
Mira Gonen,
Yuval Shavitt
Abstract:
In highly distributed Internet measurement systems distributed agents periodically measure the Internet using a tool called {\tt traceroute}, which discovers a path in the network graph. Each agent performs many traceroute measurement to a set of destinations in the network, and thus reveals a portion of the Internet graph as it is seen from the agent locations. In every period we need to check…
▽ More
In highly distributed Internet measurement systems distributed agents periodically measure the Internet using a tool called {\tt traceroute}, which discovers a path in the network graph. Each agent performs many traceroute measurement to a set of destinations in the network, and thus reveals a portion of the Internet graph as it is seen from the agent locations. In every period we need to check whether previously discovered edges still exist in this period, a process termed {\em validation}. For this end we maintain a database of all the different measurements performed by each agent. Our aim is to be able to {\em validate} the existence of all previously discovered edges in the minimum possible time. In this work we formulate the validation problem as a generalization of the well know set cover problem. We reduce the set cover problem to the validation problem, thus proving that the validation problem is ${\cal NP}$-hard. We present a $O(\log n)$-approximation algorithm to the validation problem, where $n$ in the number of edges that need to be validated. We also show that unless ${\cal P = NP}$ the approximation ratio of the validation problem is $Ω(\log n)$.
△ Less
Submitted 21 July, 2008;
originally announced July 2008.
-
Near-Deterministic Inference of AS Relationships
Authors:
Yuval Shavitt,
Eran Shir,
Udi Weinsberg
Abstract:
The discovery of Autonomous Systems (ASes) interconnections and the inference of their commercial Type-of-Relationships (ToR) has been extensively studied during the last few years. The main motivation is to accurately calculate AS-level paths and to provide better topological view of the Internet. An inherent problem in current algorithms is their extensive use of heuristics. Such heuristics in…
▽ More
The discovery of Autonomous Systems (ASes) interconnections and the inference of their commercial Type-of-Relationships (ToR) has been extensively studied during the last few years. The main motivation is to accurately calculate AS-level paths and to provide better topological view of the Internet. An inherent problem in current algorithms is their extensive use of heuristics. Such heuristics incur unbounded errors which are spread over all inferred relationships. We propose a near-deterministic algorithm for solving the ToR inference problem. Our algorithm uses as input the Internet core, which is a dense sub-graph of top-level ASes. We test several methods for creating such a core and demonstrate the robustness of the algorithm to the core's size and density, the inference period, and errors in the core.
We evaluate our algorithm using AS-level paths collected from RouteViews BGP paths and DIMES traceroute measurements. Our proposed algorithm deterministically infers over 95% of the approximately 58,000 AS topology links. The inference becomes stable when using a week worth of data and as little as 20 ASes in the core. The algorithm infers 2-3 times more peer-to-peer relationships in edges discovered only by DIMES than in RouteViews edges, validating the DIMES promise to discover periphery AS edges.
△ Less
Submitted 28 November, 2007;
originally announced November 2007.
-
New Model of Internet Topology Using k-shell Decomposition
Authors:
Shai Carmi,
Shlomo Havlin,
Scott Kirkpatrick,
Yuval Shavitt,
Eran Shir
Abstract:
We introduce and use k-shell decomposition to investigate the topology of the Internet at the AS level. Our analysis separates the Internet into three sub-components: (a) a nucleus which is a small (~100 nodes) very well connected globally distributed subgraph; (b) a fractal sub-component that is able to connect the bulk of the Internet without congesting the nucleus, with self similar propertie…
▽ More
We introduce and use k-shell decomposition to investigate the topology of the Internet at the AS level. Our analysis separates the Internet into three sub-components: (a) a nucleus which is a small (~100 nodes) very well connected globally distributed subgraph; (b) a fractal sub-component that is able to connect the bulk of the Internet without congesting the nucleus, with self similar properties and critical exponents; and (c) dendrite-like structures, usually isolated nodes that are connected to the rest of the network through the nucleus only. This unique decomposition is robust, and provides insight into the underlying structure of the Internet and its functional consequences. Our approach is general and useful also when studying other complex networks.
△ Less
Submitted 17 July, 2006;
originally announced July 2006.
-
MEDUSA - New Model of Internet Topology Using k-shell Decomposition
Authors:
Shai Carmi,
Shlomo Havlin,
Scott Kirkpatrick,
Yuval Shavitt,
Eran Shir
Abstract:
The k-shell decomposition of a random graph provides a different and more insightful separation of the roles of the different nodes in such a graph than does the usual analysis in terms of node degrees. We develop this approach in order to analyze the Internet's structure at a coarse level, that of the "Autonomous Systems" or ASes, the subnetworks out of which the Internet is assembled. We emplo…
▽ More
The k-shell decomposition of a random graph provides a different and more insightful separation of the roles of the different nodes in such a graph than does the usual analysis in terms of node degrees. We develop this approach in order to analyze the Internet's structure at a coarse level, that of the "Autonomous Systems" or ASes, the subnetworks out of which the Internet is assembled. We employ new data from DIMES (see http://www.netdimes.org), a distributed agent-based mapping effort which at present has attracted over 3800 volunteers running more than 7300 DIMES clients in over 85 countries. We combine this data with the AS graph information available from the RouteViews project at Univ. Oregon, and have obtained an Internet map with far more detail than any previous effort.
The data suggests a new picture of the AS-graph structure, which distinguishes a relatively large, redundantly connected core of nearly 100 ASes and two components that flow data in and out from this core. One component is fractally interconnected through peer links; the second makes direct connections to the core only. The model which results has superficial similarities with and important differences from the "Jellyfish" structure proposed by Tauro et al., so we call it a "Medusa." We plan to use this picture as a framework for measuring and extrapolating changes in the Internet's physical structure. Our k-shell analysis may also be relevant for estimating the function of nodes in the "scale-free" graphs extracted from other naturally-occurring processes.
△ Less
Submitted 11 January, 2006;
originally announced January 2006.
-
DIMES: Let the Internet Measure Itself
Authors:
Yuval Shavitt,
Eran Shir
Abstract:
Today's Internet maps, which are all collected from a small number of vantage points, are falling short of being accurate. We suggest here a paradigm shift for this task. DIMES is a distributed measurement infrastructure for the Internet that is based on the deployment of thousands of light weight measurement agents around the globe.
We describe the rationale behind DIMES deployment, discuss i…
▽ More
Today's Internet maps, which are all collected from a small number of vantage points, are falling short of being accurate. We suggest here a paradigm shift for this task. DIMES is a distributed measurement infrastructure for the Internet that is based on the deployment of thousands of light weight measurement agents around the globe.
We describe the rationale behind DIMES deployment, discuss its design trade-offs and algorithmic challenges, and analyze the structure of the Internet as it seen with DIMES.
△ Less
Submitted 28 June, 2005;
originally announced June 2005.
-
On the Tomography of Networks and Multicast Trees
Authors:
R. Cohen,
D. Dolev,
S. Havlin,
T. Kalisky,
O. Mokryn,
Y. Shavitt
Abstract:
In this paper we model the tomography of scale free networks by studying the structure of layers around an arbitrary network node. We find, both analytically and empirically, that the distance distribution of all nodes from a specific network node consists of two regimes. The first is characterized by rapid growth, and the second decays exponentially. We also show that the nodes degree distribut…
▽ More
In this paper we model the tomography of scale free networks by studying the structure of layers around an arbitrary network node. We find, both analytically and empirically, that the distance distribution of all nodes from a specific network node consists of two regimes. The first is characterized by rapid growth, and the second decays exponentially. We also show that the nodes degree distribution at each layer is a power law with an exponential cut-off. We obtain similar results for the layers surrounding the root of multicast trees cut from such networks, as well as the Internet. All of our results were obtained both analytically and on empirical Interenet data.
△ Less
Submitted 25 May, 2003;
originally announced May 2003.