Anonymized Network Sensing Graph Challenge
Authors:
Hayden Jananthan,
Michael Jones,
William Arcand,
David Bestor,
William Bergeron,
Daniel Burrill,
Aydin Buluc,
Chansup Byun,
Timothy Davis,
Vijay Gadepally,
Daniel Grant,
Michael Houle,
Matthew Hubbell,
Piotr Luszczek,
Peter Michaleas,
Lauren Milechin,
Chasen Milner,
Guillermo Morales,
Andrew Morris,
Julie Mullen,
Ritesh Patel,
Alex Pentland,
Sandeep Pisharody,
Andrew Prout,
Albert Reuther
, et al. (4 additional authors not shown)
Abstract:
The MIT/IEEE/Amazon GraphChallenge encourages community approaches to developing new solutions for analyzing graphs and sparse data derived from social media, sensor feeds, and scientific data to discover relationships between events as they unfold in the field. The anonymized network sensing Graph Challenge seeks to enable large, open, community-based approaches to protecting networks. Many large…
▽ More
The MIT/IEEE/Amazon GraphChallenge encourages community approaches to developing new solutions for analyzing graphs and sparse data derived from social media, sensor feeds, and scientific data to discover relationships between events as they unfold in the field. The anonymized network sensing Graph Challenge seeks to enable large, open, community-based approaches to protecting networks. Many large-scale networking problems can only be solved with community access to very broad data sets with the highest regard for privacy and strong community buy-in. Such approaches often require community-based data sharing. In the broader networking community (commercial, federal, and academia) anonymized source-to-destination traffic matrices with standard data sharing agreements have emerged as a data product that can meet many of these requirements. This challenge provides an opportunity to highlight novel approaches for optimizing the construction and analysis of anonymized traffic matrices using over 100 billion network packets derived from the largest Internet telescope in the world (CAIDA). This challenge specifies the anonymization, construction, and analysis of these traffic matrices. A GraphBLAS reference implementation is provided, but the use of GraphBLAS is not required in this Graph Challenge. As with prior Graph Challenges the goal is to provide a well-defined context for demonstrating innovation. Graph Challenge participants are free to select (with accompanying explanation) the Graph Challenge elements that are appropriate for highlighting their innovations.
△ Less
Submitted 26 June, 2025; v1 submitted 12 September, 2024;
originally announced September 2024.
Focusing and Calibration of Large Scale Network Sensors using GraphBLAS Anonymized Hypersparse Matrices
Authors:
Jeremy Kepner,
Michael Jones,
Phil Dykstra,
Chansup Byun,
Timothy Davis,
Hayden Jananthan,
William Arcand,
David Bestor,
William Bergeron,
Vijay Gadepally,
Micheal Houle,
Matthew Hubbell,
Anna Klein,
Lauren Milechin,
Guillermo Morales,
Julie Mullen,
Ritesh Patel,
Alex Pentland,
Sandeep Pisharody,
Andrew Prout,
Albert Reuther,
Antonio Rosa,
Siddharth Samsi,
Tyler Trigg,
Charles Yee
, et al. (1 additional authors not shown)
Abstract:
Defending community-owned cyber space requires community-based efforts. Large-scale network observations that uphold the highest regard for privacy are key to protecting our shared cyberspace. Deployment of the necessary network sensors requires careful sensor placement, focusing, and calibration with significant volumes of network observations. This paper demonstrates novel focusing and calibrati…
▽ More
Defending community-owned cyber space requires community-based efforts. Large-scale network observations that uphold the highest regard for privacy are key to protecting our shared cyberspace. Deployment of the necessary network sensors requires careful sensor placement, focusing, and calibration with significant volumes of network observations. This paper demonstrates novel focusing and calibration procedures on a multi-billion packet dataset using high-performance GraphBLAS anonymized hypersparse matrices. The run-time performance on a real-world data set confirms previously observed real-time processing rates for high-bandwidth links while achieving significant data compression. The output of the analysis demonstrates the effectiveness of these procedures at focusing the traffic matrix and revealing the underlying stable heavy-tail statistical distributions that are necessary for anomaly detection. A simple model of the corresponding probability of detection ($p_{\rm d}$) and probability of false alarm ($p_{\rm fa}$) for these distributions highlights the criticality of network sensor focusing and calibration. Once a sensor is properly focused and calibrated it is then in a position to carry out two of the central tenets of good cybersecurity: (1) continuous observation of the network and (2) minimizing unbrokered network connections.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.