Search | arXiv e-print repository

arXiv:2006.08530 [pdf, other]

Selecting the Number of Clusters $K$ with a Stability Trade-off: an Internal Validation Criterion

Authors: Alex Mourer, Florent Forest, Mustapha Lebbah, Hanane Azzag, Jérôme Lacaille

Abstract: Model selection is a major challenge in non-parametric clustering. There is no universally admitted way to evaluate clustering results for the obvious reason that no ground truth is available. The difficulty to find a universal evaluation criterion is a consequence of the ill-defined objective of clustering. In this perspective, clustering stability has emerged as a natural and model-agnostic prin… ▽ More Model selection is a major challenge in non-parametric clustering. There is no universally admitted way to evaluate clustering results for the obvious reason that no ground truth is available. The difficulty to find a universal evaluation criterion is a consequence of the ill-defined objective of clustering. In this perspective, clustering stability has emerged as a natural and model-agnostic principle: an algorithm should find stable structures in the data. If data sets are repeatedly sampled from the same underlying distribution, an algorithm should find similar partitions. However, stability alone is not well-suited to determine the number of clusters. For instance, it is unable to detect if the number of clusters is too small. We propose a new principle: a good clustering should be stable, and within each cluster, there should exist no stable partition. This principle leads to a novel clustering validation criterion based on between-cluster and within-cluster stability, overcoming limitations of previous stability-based methods. We empirically demonstrate the effectiveness of our criterion to select the number of clusters and compare it with existing methods. Code is available at https://github.com/FlorentF9/skstab. △ Less

Submitted 16 May, 2023; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: Accepted at PAKDD 2023

MSC Class: 62H30 (Primary) 68T10 (Secondary) ACM Class: I.5.3

arXiv:1508.04154 [pdf, ps, other]

doi 10.1007/978-3-319-07695-9_14

Anomaly Detection Based on Confidence Intervals Using SOM with an Application to Health Monitoring

Authors: Anastasios Bellas, Charles Bouveyron, Marie Cottrell, Jerome Lacaille

Abstract: We develop an application of SOM for the task of anomaly detection and visualization. To remove the effect of exogenous independent variables, we use a correction model which is more accurate than the usual one, since we apply different linear models in each cluster of context. We do not assume any particular probability distribution of the data and the detection method is based on the distance of… ▽ More We develop an application of SOM for the task of anomaly detection and visualization. To remove the effect of exogenous independent variables, we use a correction model which is more accurate than the usual one, since we apply different linear models in each cluster of context. We do not assume any particular probability distribution of the data and the detection method is based on the distance of new data to the Kohonen map learned with corrected healthy data. We apply the proposed method to the detection of aircraft engine anomalies. △ Less

Submitted 30 June, 2015; originally announced August 2015.

Journal ref: T. Villmann, F.M. Schleif, M. Kaden, M. Lange. 10th International Workshop on Self-Organizing Maps, Jul 2014, Mittweida, Germany. Springer, 295, pp.145-155, 2014, Advances in Self-Organizing Maps and Learning Vector Quantization AISC

arXiv:1506.04177 [pdf, ps, other]

Search Strategies for Binary Feature Selection for a Naive Bayes Classifier

Authors: Tsirizo Rabenoro, Jérôme Lacaille, Marie Cottrell, Fabrice Rossi

Abstract: We compare in this paper several feature selection methods for the Naive Bayes Classifier (NBC) when the data under study are described by a large number of redundant binary indicators. Wrapper approaches guided by the NBC estimation of the classification error probability out-perform filter approaches while retaining a reasonable computational cost. We compare in this paper several feature selection methods for the Naive Bayes Classifier (NBC) when the data under study are described by a large number of redundant binary indicators. Wrapper approaches guided by the NBC estimation of the classification error probability out-perform filter approaches while retaining a reasonable computational cost. △ Less

Submitted 12 June, 2015; originally announced June 2015.

Journal ref: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Apr 2015, Bruges, Belgium. pp.291-296, 2015, Proceedings of the 23-th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2015)

arXiv:1503.05526 [pdf, other]

Interpretable Aircraft Engine Diagnostic via Expert Indicator Aggregation

Authors: Tsirizo Rabenoro, Jérôme Lacaille, Marie Cottrell, Fabrice Rossi

Abstract: Detecting early signs of failures (anomalies) in complex systems is one of the main goal of preventive maintenance. It allows in particular to avoid actual failures by (re)scheduling maintenance operations in a way that optimizes maintenance costs. Aircraft engine health monitoring is one representative example of a field in which anomaly detection is crucial. Manufacturers collect large amount of… ▽ More Detecting early signs of failures (anomalies) in complex systems is one of the main goal of preventive maintenance. It allows in particular to avoid actual failures by (re)scheduling maintenance operations in a way that optimizes maintenance costs. Aircraft engine health monitoring is one representative example of a field in which anomaly detection is crucial. Manufacturers collect large amount of engine related data during flights which are used, among other applications, to detect anomalies. This article introduces and studies a generic methodology that allows one to build automatic early signs of anomaly detection in a way that builds upon human expertise and that remains understandable by human operators who make the final maintenance decision. The main idea of the method is to generate a very large number of binary indicators based on parametric anomaly scores designed by experts, complemented by simple aggregations of those scores. A feature selection method is used to keep only the most discriminant indicators which are used as inputs of a Naive Bayes classifier. This give an interpretable classifier based on interpretable anomaly detectors whose parameters have been optimized indirectly by the selection process. The proposed methodology is evaluated on simulated data designed to reproduce some of the anomaly types observed in real world engines. △ Less

Submitted 18 March, 2015; originally announced March 2015.

Comments: arXiv admin note: substantial text overlap with arXiv:1408.6214, arXiv:1409.4747, arXiv:1407.0880

Journal ref: Transactions on Machine Learning and Data Mining, 2014, 7 (2), pp.39-64

arXiv:1409.4747 [pdf, other]

doi 10.1109/IJCNN.2014.6889841

Anomaly Detection Based on Indicators Aggregation

Authors: Tsirizo Rabenoro, Jérôme Lacaille, Marie Cottrell, Fabrice Rossi

Abstract: Automatic anomaly detection is a major issue in various areas. Beyond mere detection, the identification of the source of the problem that produced the anomaly is also essential. This is particularly the case in aircraft engine health monitoring where detecting early signs of failure (anomalies) and helping the engine owner to implement efficiently the adapted maintenance operations (fixing the so… ▽ More Automatic anomaly detection is a major issue in various areas. Beyond mere detection, the identification of the source of the problem that produced the anomaly is also essential. This is particularly the case in aircraft engine health monitoring where detecting early signs of failure (anomalies) and helping the engine owner to implement efficiently the adapted maintenance operations (fixing the source of the anomaly) are of crucial importance to reduce the costs attached to unscheduled maintenance. This paper introduces a general methodology that aims at classifying monitoring signals into normal ones and several classes of abnormal ones. The main idea is to leverage expert knowledge by generating a very large number of binary indicators. Each indicator corresponds to a fully parametrized anomaly detector built from parametric anomaly scores designed by experts. A feature selection method is used to keep only the most discriminant indicators which are used at inputs of a Naive Bayes classifier. This give an interpretable classifier based on interpretable anomaly detectors whose parameters have been optimized indirectly by the selection process. The proposed methodology is evaluated on simulated data designed to reproduce some of the anomaly types observed in real world engines. △ Less

Submitted 16 September, 2014; originally announced September 2014.

Comments: International Joint Conference on Neural Networks (IJCNN 2014), Beijing : China (2014). arXiv admin note: substantial text overlap with arXiv:1407.0880

arXiv:1408.6214 [pdf, other]

doi 10.1007/978-3-319-08976-8_11

A Methodology for the Diagnostic of Aircraft Engine Based on Indicators Aggregation

Authors: Tsirizo Rabenoro, Jérôme Lacaille, Marie Cottrell, Fabrice Rossi

Abstract: Aircraft engine manufacturers collect large amount of engine related data during flights. These data are used to detect anomalies in the engines in order to help companies optimize their maintenance costs. This article introduces and studies a generic methodology that allows one to build automatic early signs of anomaly detection in a way that is understandable by human operators who make the fina… ▽ More Aircraft engine manufacturers collect large amount of engine related data during flights. These data are used to detect anomalies in the engines in order to help companies optimize their maintenance costs. This article introduces and studies a generic methodology that allows one to build automatic early signs of anomaly detection in a way that is understandable by human operators who make the final maintenance decision. The main idea of the method is to generate a very large number of binary indicators based on parametric anomaly scores designed by experts, complemented by simple aggregations of those scores. The best indicators are selected via a classical forward scheme, leading to a much reduced number of indicators that are tuned to a data set. We illustrate the interest of the method on simulated data which contain realistic early signs of anomalies. △ Less

Submitted 26 August, 2014; originally announced August 2014.

Comments: Proceedings of the 14th Industrial Conference, ICDM 2014, St. Petersburg : Russian Federation (2014)

arXiv:1407.0880 [pdf, other]

Anomaly Detection Based on Aggregation of Indicators

Authors: Tsirizo Rabenoro, Jérôme Lacaille, Marie Cottrell, Fabrice Rossi

Abstract: Automatic anomaly detection is a major issue in various areas. Beyond mere detection, the identification of the origin of the problem that produced the anomaly is also essential. This paper introduces a general methodology that can assist human operators who aim at classifying monitoring signals. The main idea is to leverage expert knowledge by generating a very large number of indicators. A featu… ▽ More Automatic anomaly detection is a major issue in various areas. Beyond mere detection, the identification of the origin of the problem that produced the anomaly is also essential. This paper introduces a general methodology that can assist human operators who aim at classifying monitoring signals. The main idea is to leverage expert knowledge by generating a very large number of indicators. A feature selection method is used to keep only the most discriminant indicators which are used as inputs of a Naive Bayes classifier. The parameters of the classifier have been optimized indirectly by the selection process. Simulated data designed to reproduce some of the anomaly types observed in real world engines. △ Less

Submitted 16 September, 2014; v1 submitted 3 July, 2014; originally announced July 2014.

Comments: 23rd annual Belgian-Dutch Conference on Machine Learning (Benelearn 2014), Bruxelles : Belgium (2014)

arXiv:0907.1368 [pdf]

Fault prediction in aircraft engines using Self-Organizing Maps

Authors: Marie Cottrell, Patrice Gaubert, Cédric Eloy, Damien François, Geoffroy Hallaux, Jérôme Lacaille, Michel Verleysen

Abstract: Aircraft engines are designed to be used during several tens of years. Their maintenance is a challenging and costly task, for obvious security reasons. The goal is to ensure a proper operation of the engines, in all conditions, with a zero probability of failure, while taking into account aging. The fact that the same engine is sometimes used on several aircrafts has to be taken into account to… ▽ More Aircraft engines are designed to be used during several tens of years. Their maintenance is a challenging and costly task, for obvious security reasons. The goal is to ensure a proper operation of the engines, in all conditions, with a zero probability of failure, while taking into account aging. The fact that the same engine is sometimes used on several aircrafts has to be taken into account too. The maintenance can be improved if an efficient procedure for the prediction of failures is implemented. The primary source of information on the health of the engines comes from measurement during flights. Several variables such as the core speed, the oil pressure and quantity, the fan speed, etc. are measured, together with environmental variables such as the outside temperature, altitude, aircraft speed, etc. In this paper, we describe the design of a procedure aiming at visualizing successive data measured on aircraft engines. The data are multi-dimensional measurements on the engines, which are projected on a self-organizing map in order to allow us to follow the trajectories of these data over time. The trajectories consist in a succession of points on the map, each of them corresponding to the two-dimensional projection of the multi-dimensional vector of engine measurements. Analyzing the trajectories aims at visualizing any deviation from a normal behavior, making it possible to anticipate an operation failure. △ Less

Submitted 8 July, 2009; originally announced July 2009.

Comments: Communication présentée au 7th International Workshop WSOM 09, St Augustine, Floride, USA, June 2009

Journal ref: Advances in Self-Organizing Maps, José Principe, Risto Miikkulainen (Ed.) (2009) 37-44

Showing 1–8 of 8 results for author: Lacaille, J