-
Standardization of Weighted Ranking Correlation Coefficients
Authors:
Pierangelo Lombardo
Abstract:
A relevant problem in statistics is defining the correlation of two rankings of a list of items. Kendall's tau and Spearman's rho are two well established correlation coefficients, characterized by a symmetric form that ensures zero expected value between two pairs of rankings randomly chosen with uniform probability. However, in recent years, several weighted versions of the original Spearman and…
▽ More
A relevant problem in statistics is defining the correlation of two rankings of a list of items. Kendall's tau and Spearman's rho are two well established correlation coefficients, characterized by a symmetric form that ensures zero expected value between two pairs of rankings randomly chosen with uniform probability. However, in recent years, several weighted versions of the original Spearman and Kendall coefficients have emerged that take into account the greater importance of top ranks compared to low ranks, which is common in many contexts. The weighting schemes break the symmetry, causing a non-zero expected value between two random rankings. This issue is very relevant, as it undermines the concept of uncorrelation between rankings. In this paper, we address this problem by proposing a standardization function $g(x)$ that maps a correlation ranking coefficient $Γ$ in a standard form $g(Γ)$ that has zero expected value, while maintaining the relevant statistical properties of $Γ$.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
RadField3D: A Data Generator and Data Format for Deep Learning in Radiation-Protection Dosimetry for Medical Applications
Authors:
Felix Lehner,
Pasquale Lombardo,
Susana Castillo,
Oliver Hupe,
Marcus Magnor
Abstract:
In this research work, we present our open-source Geant4-based Monte-Carlo simulation application, called RadField3D, for generating threedimensional radiation field datasets for dosimetry. Accompanying, we introduce a fast, machine-interpretable data format with a Python API for easy integration into neural network research, that we call RadFiled3D. Both developments are intended to be used to re…
▽ More
In this research work, we present our open-source Geant4-based Monte-Carlo simulation application, called RadField3D, for generating threedimensional radiation field datasets for dosimetry. Accompanying, we introduce a fast, machine-interpretable data format with a Python API for easy integration into neural network research, that we call RadFiled3D. Both developments are intended to be used to research alternative radiation simulation methods using deep learning.
△ Less
Submitted 18 December, 2024;
originally announced December 2024.
-
Top-Rank-Focused Adaptive Vote Collection for the Evaluation of Domain-Specific Semantic Models
Authors:
Pierangelo Lombardo,
Alessio Boiardi,
Luca Colombo,
Angelo Schiavone,
Nicolò Tamagnone
Abstract:
The growth of domain-specific applications of semantic models, boosted by the recent achievements of unsupervised embedding learning algorithms, demands domain-specific evaluation datasets. In many cases, content-based recommenders being a prime example, these models are required to rank words or texts according to their semantic relatedness to a given concept, with particular focus on top ranks.…
▽ More
The growth of domain-specific applications of semantic models, boosted by the recent achievements of unsupervised embedding learning algorithms, demands domain-specific evaluation datasets. In many cases, content-based recommenders being a prime example, these models are required to rank words or texts according to their semantic relatedness to a given concept, with particular focus on top ranks. In this work, we give a threefold contribution to address these requirements: (i) we define a protocol for the construction, based on adaptive pairwise comparisons, of a relatedness-based evaluation dataset tailored on the available resources and optimized to be particularly accurate in top-rank evaluation; (ii) we define appropriate metrics, extensions of well-known ranking correlation coefficients, to evaluate a semantic model via the aforementioned dataset by taking into account the greater significance of top ranks. Finally, (iii) we define a stochastic transitivity model to simulate semantic-driven pairwise comparisons, which confirms the effectiveness of the proposed dataset construction protocol.
△ Less
Submitted 9 October, 2020;
originally announced October 2020.
-
DNS Covert Channel Detection via Behavioral Analysis: a Machine Learning Approach
Authors:
Salvatore Saeli,
Federica Bisio,
Pierangelo Lombardo,
Danilo Massa
Abstract:
Detecting covert channels among legitimate traffic represents a severe challenge due to the high heterogeneity of networks. Therefore, we propose an effective covert channel detection method, based on the analysis of DNS network data passively extracted from a network monitoring system. The framework is based on a machine learning module and on the extraction of specific anomaly indicators able to…
▽ More
Detecting covert channels among legitimate traffic represents a severe challenge due to the high heterogeneity of networks. Therefore, we propose an effective covert channel detection method, based on the analysis of DNS network data passively extracted from a network monitoring system. The framework is based on a machine learning module and on the extraction of specific anomaly indicators able to describe the problem at hand. The contribution of this paper is two-fold: (i) the machine learning models encompass network profiles tailored to the network users, and not to the single query events, hence allowing for the creation of behavioral profiles and spotting possible deviations from the normal baseline; (ii) models are created in an unsupervised mode, thus allowing for the identification of zero-days attacks and avoiding the requirement of signatures or heuristics for new variants. The proposed solution has been evaluated over a 15-day-long experimental session with the injection of traffic that covers the most relevant exfiltration and tunneling attacks: all the malicious variants were detected, while producing a low false-positive rate during the same period.
△ Less
Submitted 4 October, 2020;
originally announced October 2020.
-
Fast Flux Service Network Detection via Data Mining on Passive DNS Traffic
Authors:
Pierangelo Lombardo,
Salvatore Saeli,
Federica Bisio,
Davide Bernardi,
Danilo Massa
Abstract:
In the last decade, the use of fast flux technique has become established as a common practice to organise botnets in Fast Flux Service Networks (FFSNs), which are platforms able to sustain illegal online services with very high availability. In this paper, we report on an effective fast flux detection algorithm based on the passive analysis of the Domain Name System (DNS) traffic of a corporate n…
▽ More
In the last decade, the use of fast flux technique has become established as a common practice to organise botnets in Fast Flux Service Networks (FFSNs), which are platforms able to sustain illegal online services with very high availability. In this paper, we report on an effective fast flux detection algorithm based on the passive analysis of the Domain Name System (DNS) traffic of a corporate network. The proposed method is based on the near-real-time identification of different metrics that measure a wide range of fast flux key features; the metrics are combined via a simple but effective mathematical and data mining approach. The proposed solution has been evaluated in a one-month experiment over an enterprise network, with the injection of pcaps associated with different malware campaigns, that leverage FFSNs and cover a wide variety of attack scenarios. An in-depth analysis of a list of fast flux domains confirmed the reliability of the metrics used in the proposed algorithm and allowed for the identification of many IPs that turned out to be part of two notorious FFSNs, namely Dark Cloud and SandiFlux, to the description of which we therefore contribute. All the fast flux domains were detected with a very low false positive rate; a comparison of performance indicators with previous works show a remarkable improvement.
△ Less
Submitted 18 September, 2018; v1 submitted 17 April, 2018;
originally announced April 2018.