Search | arXiv e-print repository

doi 10.1007/978-3-319-91476-3_3

A Typology of Data Anomalies

Abstract: Anomalies are cases that are in some way unusual and do not appear to fit the general patterns present in the dataset. Several conceptualizations exist to distinguish between different types of anomalies. However, these are either too specific to be generally applicable or so abstract that they neither provide concrete insight into the nature of anomaly types nor facilitate the functional evaluati… ▽ More Anomalies are cases that are in some way unusual and do not appear to fit the general patterns present in the dataset. Several conceptualizations exist to distinguish between different types of anomalies. However, these are either too specific to be generally applicable or so abstract that they neither provide concrete insight into the nature of anomaly types nor facilitate the functional evaluation of anomaly detection algorithms. With the recent criticism on 'black box' algorithms and analytics it has become clear that this is an undesirable situation. This paper therefore introduces a general typology of anomalies that offers a clear and tangible definition of the different types of anomalies in datasets. The typology also facilitates the evaluation of the functional capabilities of anomaly detection algorithms and as a framework assists in analyzing the conceptual levels of data, patterns and anomalies. Finally, it serves as an analytical tool for studying anomaly types from other typologies. △ Less

Submitted 4 July, 2021; originally announced July 2021.

Comments: 13 pages, 5 figures. Presented at the 17th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2018). Note: for a fully developed and more detailed typology of anomalies, see the follow-up publication 'On the Nature and Types of Anomalies: A Review of Deviations in Data'. arXiv admin note: text overlap with arXiv:2007.15634

MSC Class: 62G07 ACM Class: G.3; I.2.6; I.5

arXiv:2010.04705 [pdf]

doi 10.1109/SSCI47803.2020.9308417

Algorithmic Frameworks for the Detection of High Density Anomalies

Authors: Ralph Foorthuis

Abstract: This study explores the concept of high-density anomalies. As opposed to the traditional concept of anomalies as isolated occurrences, high-density anomalies are deviant cases positioned in the most normal regions of the data space. Such anomalies are relevant for various practical use cases, such as misbehavior detection and data quality analysis. Effective methods for identifying them are partic… ▽ More This study explores the concept of high-density anomalies. As opposed to the traditional concept of anomalies as isolated occurrences, high-density anomalies are deviant cases positioned in the most normal regions of the data space. Such anomalies are relevant for various practical use cases, such as misbehavior detection and data quality analysis. Effective methods for identifying them are particularly important when analyzing very large or noisy sets, for which traditional anomaly detection algorithms will return many false positives. In order to be able to identify high-density anomalies, this study introduces several non-parametric algorithmic frameworks for unsupervised detection. These frameworks are able to leverage existing underlying anomaly detection algorithms and offer different solutions for the balancing problem inherent in this detection task. The frameworks are evaluated with both synthetic and real-world datasets, and are compared with existing baseline algorithms for detecting traditional anomalies. The Iterative Partial Push (IPP) framework proves to yield the best detection results. △ Less

Submitted 4 April, 2021; v1 submitted 9 October, 2020; originally announced October 2020.

Comments: 10 pages, 9 figures, 6 tables. Accepted for presentation at IEEE SSCI CIDM 2020 (Symposium on Computational Intelligence in Data Mining)

MSC Class: 62G07 ACM Class: G.3; I.2.6; I.5

arXiv:2008.12330 [pdf]

The Impact of Discretization Method on the Detection of Six Types of Anomalies in Datasets

Authors: Ralph Foorthuis

Abstract: Anomaly detection is the process of identifying cases, or groups of cases, that are in some way unusual and do not fit the general patterns present in the dataset. Numerous algorithms use discretization of numerical data in their detection processes. This study investigates the effect of the discretization method on the unsupervised detection of each of the six anomaly types acknowledged in a rece… ▽ More Anomaly detection is the process of identifying cases, or groups of cases, that are in some way unusual and do not fit the general patterns present in the dataset. Numerous algorithms use discretization of numerical data in their detection processes. This study investigates the effect of the discretization method on the unsupervised detection of each of the six anomaly types acknowledged in a recent typology of data anomalies. To this end, experiments are conducted with various datasets and SECODA, a general-purpose algorithm for unsupervised non-parametric anomaly detection in datasets with numerical and categorical attributes. This algorithm employs discretization of continuous attributes, exponentially increasing weights and discretization cut points, and a pruning heuristic to detect anomalies with an optimal number of iterations. The results demonstrate that standard SECODA can detect all six types, but that different discretization methods favor the discovery of certain anomaly types. The main findings also hold for other detection techniques using discretization. △ Less

Submitted 27 August, 2020; originally announced August 2020.

Comments: 16 pages, 5 figures, 2 tables. Presented at the 30th Benelux Conference on Artificial Intelligence (BNAIC 2018)

MSC Class: 62G07 ACM Class: G.3; I.2.6; I.5

arXiv:2008.11026 [pdf]

On Course, But Not There Yet: Enterprise Architecture Conformance and Benefits in Systems Development

Authors: Ralph Foorthuis, Marlies van Steenbergen, Nino Mushkudiani, Wiel Bruls, Sjaak Brinkkemper, Rik Bos

Abstract: Various claims have been made regarding the benefits that Enterprise Architecture (EA) delivers for both individual systems development projects and the organization as a whole. This paper presents the statistical findings of a survey study (n=293) carried out to empirically test these claims. First, we investigated which techniques are used in practice to stimulate conformance to EA. Secondly, we… ▽ More Various claims have been made regarding the benefits that Enterprise Architecture (EA) delivers for both individual systems development projects and the organization as a whole. This paper presents the statistical findings of a survey study (n=293) carried out to empirically test these claims. First, we investigated which techniques are used in practice to stimulate conformance to EA. Secondly, we studied which benefits are actually gained. Thirdly, we verified whether EA creators (e.g. enterprise architects) and EA users (e.g. project members) differ in their perceptions regarding EA. Finally, we investigated which of the applied techniques most effectively increase project conformance to and effectiveness of EA. A multivariate regression analysis demonstrates that three techniques have a major impact on conformance: carrying out compliance assessments, management propagation of EA and providing assistance to projects. Although project conformance plays a central role in reaping various benefits at both the organizational and the project level, it is shown that a number of important benefits have not yet been fully achieved. △ Less

Submitted 23 August, 2020; originally announced August 2020.

Comments: 19 pages (excluding cover pages), 2 figures, 11 tables. Proceedings of the Thirty First International Conference on Information Systems (ICIS 2010), St. Louis, Missouri, USA. arXiv admin note: text overlap with arXiv:2008.08112

ACM Class: K.4.3; K.5.2

arXiv:2008.08112 [pdf]

doi 10.1007/s10796-014-9542-1

A Theory Building Study of Enterprise Architecture Practices and Benefits

Authors: Ralph Foorthuis, Marlies van Steenbergen, Sjaak Brinkkemper, Wiel Bruls

Abstract: Academics and practitioners have made various claims regarding the benefits that Enterprise Architecture (EA) delivers for both individual projects and the organization as a whole. At the same time, there is a lack of explanatory theory regarding how EA delivers these benefits. Moreover, EA practices and benefits have not been extensively investigated by empirical research, with especially quantit… ▽ More Academics and practitioners have made various claims regarding the benefits that Enterprise Architecture (EA) delivers for both individual projects and the organization as a whole. At the same time, there is a lack of explanatory theory regarding how EA delivers these benefits. Moreover, EA practices and benefits have not been extensively investigated by empirical research, with especially quantitative studies on the topic being few and far between. This paper therefore presents the statistical findings of a theory-building survey study (n=293). The resulting PLS model is a synthesis of current implicit and fragmented theory, and shows how EA practices and intermediate benefits jointly work to help the organization reap benefits for both the organization and its projects. The model shows that EA and EA practices do not deliver benefits directly, but operate through intermediate results, most notably compliance with EA and architectural insight. Furthermore, the research identifies the EA practices that have a major impact on these results, the most important being compliance assessments, management propagation of EA, and different types of knowledge exchange. The results also demonstrate that projects play an important role in obtaining benefits from EA, but that they generally benefit less than the organization as a whole. △ Less

Submitted 18 August, 2020; originally announced August 2020.

Comments: 28 pages, 4 figures, 12 tables

ACM Class: K.4.3; K.5.2

Journal ref: Information Systems Frontiers, Vol. 18, No. 3, 2016, pp. 541-564

arXiv:2008.06869 [pdf]

doi 10.1109/DSAA.2017.35

SECODA: Segmentation- and Combination-Based Detection of Anomalies

Authors: Ralph Foorthuis

Abstract: This study introduces SECODA, a novel general-purpose unsupervised non-parametric anomaly detection algorithm for datasets containing continuous and categorical attributes. The method is guaranteed to identify cases with unique or sparse combinations of attribute values. Continuous attributes are discretized repeatedly in order to correctly determine the frequency of such value combinations. The c… ▽ More This study introduces SECODA, a novel general-purpose unsupervised non-parametric anomaly detection algorithm for datasets containing continuous and categorical attributes. The method is guaranteed to identify cases with unique or sparse combinations of attribute values. Continuous attributes are discretized repeatedly in order to correctly determine the frequency of such value combinations. The concept of constellations, exponentially increasing weights and discretization cut points, as well as a pruning heuristic are used to detect anomalies with an optimal number of iterations. Moreover, the algorithm has a low memory imprint and its runtime performance scales linearly with the size of the dataset. An evaluation with simulated and real-life datasets shows that this algorithm is able to identify many different types of anomalies, including complex multidimensional instances. An evaluation in terms of a data quality use case with a real dataset demonstrates that SECODA can bring relevant and practical value to real-world settings. △ Less

Submitted 16 August, 2020; originally announced August 2020.

Comments: 12 pages (including DSAA conference poster), 9 figures, 3 tables. Presented at DSAA 2017, the IEEE International Conference on Data Science and Advanced Analytics

MSC Class: 62G07 ACM Class: G.3; I.2.6; I.5

arXiv:2008.03775 [pdf]

Tactics for Internal Compliance: A Literature Review

Authors: Ralph Foorthuis

Abstract: Compliance of organizations with internal and external norms is a highly relevant topic for both practitioners and academics nowadays. However, the substantive, elementary compliance tactics that organizations can use for achieving internal compliance have been described in a fragmented manner and in the literatures of distinct academic disciplines. Using a multidisciplinary structured literature… ▽ More Compliance of organizations with internal and external norms is a highly relevant topic for both practitioners and academics nowadays. However, the substantive, elementary compliance tactics that organizations can use for achieving internal compliance have been described in a fragmented manner and in the literatures of distinct academic disciplines. Using a multidisciplinary structured literature review of 134 publications, this study offers three contributions. First, we present a typology of 45 compliance tactics, which constitutes a comprehensive and rich overview of elementary ways for bringing the organization into compliance. Secondly, we provide an overview of fundamental concepts in the theory of compliance, which forms the basis for the framework we developed for positioning compliance tactics and for analyzing or developing compliance strategies. Thirdly, we present insights for moving from compliance tactics to compliance strategies. In the process, and using the multidisciplinary literature review to take a bird's-eye view, we demonstrate that compliance strategies need to be regarded as a richer concept than perceived hitherto. We also show that opportunities for innovation exist. △ Less

Submitted 9 August, 2020; originally announced August 2020.

Comments: 47 pages (excl. references), 4 figures, 4 tables. Chapter of 'Project Compliance with Enterprise Architecture' (ISBN 978-90-393-5834-4)

ACM Class: K.4; K.5

arXiv:2007.15634 [pdf]

doi 10.1007/s41060-021-00265-1

On the Nature and Types of Anomalies: A Review of Deviations in Data

Authors: Ralph Foorthuis

Abstract: Anomalies are occurrences in a dataset that are in some way unusual and do not fit the general patterns. The concept of the anomaly is typically ill-defined and perceived as vague and domain-dependent. Moreover, despite some 250 years of publications on the topic, no comprehensive and concrete overviews of the different types of anomalies have hitherto been published. By means of an extensive lite… ▽ More Anomalies are occurrences in a dataset that are in some way unusual and do not fit the general patterns. The concept of the anomaly is typically ill-defined and perceived as vague and domain-dependent. Moreover, despite some 250 years of publications on the topic, no comprehensive and concrete overviews of the different types of anomalies have hitherto been published. By means of an extensive literature review this study therefore offers the first theoretically principled and domain-independent typology of data anomalies and presents a full overview of anomaly types and subtypes. To concretely define the concept of the anomaly and its different manifestations, the typology employs five dimensions: data type, cardinality of relationship, anomaly level, data structure, and data distribution. These fundamental and data-centric dimensions naturally yield 3 broad groups, 9 basic types, and 63 subtypes of anomalies. The typology facilitates the evaluation of the functional capabilities of anomaly detection algorithms, contributes to explainable data science, and provides insights into relevant topics such as local versus global anomalies. △ Less

Submitted 29 May, 2023; v1 submitted 30 July, 2020; originally announced July 2020.

Comments: 39 pages (30 pages content), 10 figures and 3 tables. Preprint; comments will be appreciated. Improvements in version 4: Small textual updates, added publication details on JDSA journal. International Journal of Data Science and Analytics, Springer (2021)

MSC Class: 62A01 ACM Class: G.3; I.2.6; I.5

Showing 1–8 of 8 results for author: Foorthuis, R