-
Detection of Sparse Anomalies in High-Dimensional Network Telescope Signals
Authors:
Rafail Kartsioukas,
Rajat Tandon,
Zheng Gao,
Jelena Mirkovic,
Michalis Kallitsis,
Stilian Stoev
Abstract:
Network operators and system administrators are increasingly overwhelmed with incessant cyber-security threats ranging from malicious network reconnaissance to attacks such as distributed denial of service and data breaches. A large number of these attacks could be prevented if the network operators were better equipped with threat intelligence information that would allow them to block or throttl…
▽ More
Network operators and system administrators are increasingly overwhelmed with incessant cyber-security threats ranging from malicious network reconnaissance to attacks such as distributed denial of service and data breaches. A large number of these attacks could be prevented if the network operators were better equipped with threat intelligence information that would allow them to block or throttle nefarious scanning activities. Network telescopes or "darknets" offer a unique window into observing Internet-wide scanners and other malicious entities, and they could offer early warning signals to operators that would be critical for infrastructure protection and/or attack mitigation. A network telescope consists of unused or "dark" IP spaces that serve no users, and solely passively observes any Internet traffic destined to the "telescope sensor" in an attempt to record ubiquitous network scanners, malware that forage for vulnerable devices, and other dubious activities. Hence, monitoring network telescopes for timely detection of coordinated and heavy scanning activities is an important, albeit challenging, task. The challenges mainly arise due to the non-stationarity and the dynamic nature of Internet traffic and, more importantly, the fact that one needs to monitor high-dimensional signals (e.g., all TCP/UDP ports) to search for "sparse" anomalies. We propose statistical methods to address both challenges in an efficient and "online" manner; our work is validated both with synthetic data as well as real-world data from a large network telescope.
△ Less
Submitted 22 June, 2023; v1 submitted 9 November, 2022;
originally announced November 2022.
-
Data-adaptive trimming of the Hill estimator and detection of outliers in the extremes of heavy-tailed data
Authors:
Shrijita Bhattacharya,
Michael Kallitsis,
Stilian Stoev
Abstract:
We introduce a trimmed version of the Hill estimator for the index of a heavy-tailed distribution, which is robust to perturbations in the extreme order statistics. In the ideal Pareto setting, the estimator is essentially finite-sample efficient among all unbiased estimators with a given strict upper break-down point. For general heavy-tailed models, we establish the asymptotic normality of the e…
▽ More
We introduce a trimmed version of the Hill estimator for the index of a heavy-tailed distribution, which is robust to perturbations in the extreme order statistics. In the ideal Pareto setting, the estimator is essentially finite-sample efficient among all unbiased estimators with a given strict upper break-down point. For general heavy-tailed models, we establish the asymptotic normality of the estimator under second order regular variation conditions and also show it is minimax rate-optimal in the Hall class of distributions. We also develop an automatic, data-driven method for the choice of the trimming parameter which yields a new type of robust estimator that can adapt to the unknown level of contamination in the extremes. This adaptive robustness property makes our estimator particularly appealing and superior to other robust estimators in the setting where the extremes of the data are contaminated. As an important application of the data-driven selection of the trimming parameters, we obtain a methodology for the principled identification of extreme outliers in heavy tailed data. Indeed, the method has been shown to correctly identify the number of outliers in the previously explored Condroz data set.
△ Less
Submitted 23 August, 2018;
originally announced August 2018.
-
Trimming the Hill estimator: robustness, optimality and adaptivity
Authors:
Shrijita Bhattacharya,
Michael Kallitsis,
Stilian Stoev
Abstract:
We introduce a trimmed version of the Hill estimator for the index of a heavy-tailed distribution, which is robust to perturbations in the extreme order statistics. In the ideal Pareto setting, the estimator is essentially finite-sample efficient among all unbiased estimators with a given strict upper break-down point. For general heavy-tailed models, we establish the asymptotic normality of the e…
▽ More
We introduce a trimmed version of the Hill estimator for the index of a heavy-tailed distribution, which is robust to perturbations in the extreme order statistics. In the ideal Pareto setting, the estimator is essentially finite-sample efficient among all unbiased estimators with a given strict upper break-down point. For general heavy-tailed models, we establish the asymptotic normality of the estimator under second order conditions and discuss its minimax optimal rate in the Hall class. We introduce the so-called trimmed Hill plot, which can be used to select the number of top order statistics to trim. We also develop an automatic, data-driven procedure for the choice of trimming. This results in a new type of robust estimator that can {\em adapt} to the unknown level of contamination in the extremes. As a by-product we also obtain a methodology for identifying extreme outliers in heavy tailed data. The competitive performance of the trimmed Hill and adaptive trimmed Hill estimators is illustrated with simulations.
△ Less
Submitted 14 November, 2017; v1 submitted 8 May, 2017;
originally announced May 2017.
-
Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles
Authors:
Ali Shojaie,
Alexandra Jauhiainen,
Michael Kallitsis,
George Michailidis
Abstract:
Reconstructing transcriptional regulatory networks is an important task in functional genomics. Data obtained from experiments that perturb genes by knockouts or RNA interference contain useful information for addressing this reconstruction problem. However, such data can be limited in size and/or are expensive to acquire. On the other hand, observational data of the organism in steady state (e.g.…
▽ More
Reconstructing transcriptional regulatory networks is an important task in functional genomics. Data obtained from experiments that perturb genes by knockouts or RNA interference contain useful information for addressing this reconstruction problem. However, such data can be limited in size and/or are expensive to acquire. On the other hand, observational data of the organism in steady state (e.g. wild-type) are more readily available, but their informational content is inadequate for the task at hand. We develop a computational approach to appropriately utilize both data sources for estimating a regulatory network. The proposed approach is based on a three-step algorithm to estimate the underlying directed but cyclic network, that uses as input both perturbation screens and steady state gene expression data. In the first step, the algorithm determines causal orderings of the genes that are consistent with the perturbation data, by combining an exhaustive search method with a fast heuristic that in turn couples a Monte Carlo technique with a fast search algorithm. In the second step, for each obtained causal ordering, a regulatory network is estimated using a penalized likelihood based method, while in the third step a consensus network is constructed from the highest scored ones. Extensive computational experiments show that the algorithm performs well in reconstructing the underlying network and clearly outperforms competing approaches that rely only on a single data source. Further, it is established that the algorithm produces a consistent estimate of the regulatory network.
△ Less
Submitted 2 December, 2013;
originally announced December 2013.
-
A State-Space Approach for Optimal Traffic Monitoring via Network Flow Sampling
Authors:
Michael Kallitsis,
Stilian Stoev,
George Michailidis
Abstract:
The robustness and integrity of IP networks require efficient tools for traffic monitoring and analysis, which scale well with traffic volume and network size. We address the problem of optimal large-scale flow monitoring of computer networks under resource constraints. We propose a stochastic optimization framework where traffic measurements are done by exploiting the spatial (across network link…
▽ More
The robustness and integrity of IP networks require efficient tools for traffic monitoring and analysis, which scale well with traffic volume and network size. We address the problem of optimal large-scale flow monitoring of computer networks under resource constraints. We propose a stochastic optimization framework where traffic measurements are done by exploiting the spatial (across network links) and temporal relationship of traffic flows. Specifically, given the network topology, the state-space characterization of network flows and sampling constraints at each monitoring station, we seek an optimal packet sampling strategy that yields the best traffic volume estimation for all flows of the network. The optimal sampling design is the result of a concave minimization problem; then, Kalman filtering is employed to yield a sequence of traffic estimates for each network flow. We evaluate our algorithm using real-world Internet2 data.
△ Less
Submitted 24 June, 2013;
originally announced June 2013.