-
Validity and efficiency of the conformal CUSUM procedure
Authors:
Vladimir Vovk,
Ilia Nouretdinov,
Alex Gammerman
Abstract:
In this paper we study the validity and efficiency of a conformal version of the CUSUM procedure for change detection both experimentally and theoretically.
In this paper we study the validity and efficiency of a conformal version of the CUSUM procedure for change detection both experimentally and theoretically.
△ Less
Submitted 4 December, 2024;
originally announced December 2024.
-
Conformal testing: binary case with Markov alternatives
Authors:
Vladimir Vovk,
Ilia Nouretdinov,
Alex Gammerman
Abstract:
We continue study of conformal testing in binary model situations. In this note we consider Markov alternatives to the null hypothesis of exchangeability. We propose two new classes of conformal test martingales; one class is statistically efficient in our experiments, and the other class partially sacrifices statistical efficiency to gain computational efficiency.
We continue study of conformal testing in binary model situations. In this note we consider Markov alternatives to the null hypothesis of exchangeability. We propose two new classes of conformal test martingales; one class is statistically efficient in our experiments, and the other class partially sacrifices statistical efficiency to gain computational efficiency.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
Retrain or not retrain: Conformal test martingales for change-point detection
Authors:
Vladimir Vovk,
Ivan Petej,
Ilia Nouretdinov,
Ernst Ahlberg,
Lars Carlsson,
Alex Gammerman
Abstract:
We argue for supplementing the process of training a prediction algorithm by setting up a scheme for detecting the moment when the distribution of the data changes and the algorithm needs to be retrained. Our proposed schemes are based on exchangeability martingales, i.e., processes that are martingales under any exchangeable distribution for the data. Our method, based on conformal prediction, is…
▽ More
We argue for supplementing the process of training a prediction algorithm by setting up a scheme for detecting the moment when the distribution of the data changes and the algorithm needs to be retrained. Our proposed schemes are based on exchangeability martingales, i.e., processes that are martingales under any exchangeable distribution for the data. Our method, based on conformal prediction, is general and can be applied on top of any modern prediction algorithm. Its validity is guaranteed, and in this paper we make first steps in exploring its efficiency.
△ Less
Submitted 20 February, 2021;
originally announced February 2021.
-
Conformal e-testing
Authors:
Vladimir Vovk,
Ilia Nouretdinov,
Alex Gammerman
Abstract:
There is a useful counterpart of conformal prediction for e-values, called conformal e-prediction. Conformal prediction can serve as basis for testing the assumption of exchangeability, leading to conformal testing. Similarly, conformal e-prediction can also serve as basis for testing. The resulting conformal e-testing looks very different from but inherits some strengths of conformal testing; it…
▽ More
There is a useful counterpart of conformal prediction for e-values, called conformal e-prediction. Conformal prediction can serve as basis for testing the assumption of exchangeability, leading to conformal testing. Similarly, conformal e-prediction can also serve as basis for testing. The resulting conformal e-testing looks very different from but inherits some strengths of conformal testing; it even has some advantages over conformal testing. In this paper we discuss systematically both strengths and limitations of conformal e-testing.
△ Less
Submitted 2 November, 2024; v1 submitted 3 June, 2020;
originally announced June 2020.
-
Computationally efficient versions of conformal predictive distributions
Authors:
Vladimir Vovk,
Ivan Petej,
Ilia Nouretdinov,
Valery Manokhin,
Alex Gammerman
Abstract:
Conformal predictive systems are a recent modification of conformal predictors that output, in regression problems, probability distributions for labels of test observations rather than set predictions. The extra information provided by conformal predictive systems may be useful, e.g., in decision making problems. Conformal predictive systems inherit the relative computational inefficiency of conf…
▽ More
Conformal predictive systems are a recent modification of conformal predictors that output, in regression problems, probability distributions for labels of test observations rather than set predictions. The extra information provided by conformal predictive systems may be useful, e.g., in decision making problems. Conformal predictive systems inherit the relative computational inefficiency of conformal predictors. In this paper we discuss two computationally efficient versions of conformal predictive systems, which we call split conformal predictive systems and cross-conformal predictive systems. The main advantage of split conformal predictive systems is their guaranteed validity, whereas for cross-conformal predictive systems validity only holds empirically and in the absence of excessive randomization. The main advantage of cross-conformal predictive systems is their greater predictive efficiency.
△ Less
Submitted 3 November, 2019;
originally announced November 2019.
-
Multi-level conformal clustering: A distribution-free technique for clustering and anomaly detection
Authors:
Ilia Nouretdinov,
James Gammerman,
Matteo Fontana,
Daljit Rehal
Abstract:
In this work we present a clustering technique called \textit{multi-level conformal clustering (MLCC)}. The technique is hierarchical in nature because it can be performed at multiple significance levels which yields greater insight into the data than performing it at just one level. We describe the theoretical underpinnings of MLCC, compare and contrast it with the hierarchical clustering algorit…
▽ More
In this work we present a clustering technique called \textit{multi-level conformal clustering (MLCC)}. The technique is hierarchical in nature because it can be performed at multiple significance levels which yields greater insight into the data than performing it at just one level. We describe the theoretical underpinnings of MLCC, compare and contrast it with the hierarchical clustering algorithm, and then apply it to real world datasets to assess its performance. There are several advantages to using MLCC over more classical clustering techniques: Once a significance level has been set, MLCC is able to automatically select the number of clusters. Furthermore, thanks to the conformal prediction framework the resulting clustering model has a clear statistical meaning without any assumptions about the distribution of the data. This statistical robustness also allows us to perform clustering and anomaly detection simultaneously. Moreover, due to the flexibility of the conformal prediction framework, our algorithm can be used on top of many other machine learning algorithms.
△ Less
Submitted 21 October, 2019; v1 submitted 17 October, 2019;
originally announced October 2019.
-
Conformal predictive distributions with kernels
Authors:
Vladimir Vovk,
Ilia Nouretdinov,
Valery Manokhin,
Alex Gammerman
Abstract:
This paper reviews the checkered history of predictive distributions in statistics and discusses two developments, one from recent literature and the other new. The first development is bringing predictive distributions into machine learning, whose early development was so deeply influenced by two remarkable groups at the Institute of Automation and Remote Control. The second development is combin…
▽ More
This paper reviews the checkered history of predictive distributions in statistics and discusses two developments, one from recent literature and the other new. The first development is bringing predictive distributions into machine learning, whose early development was so deeply influenced by two remarkable groups at the Institute of Automation and Remote Control. The second development is combining predictive distributions with kernel methods, which were originated by one of those groups, including Emmanuel Braverman.
△ Less
Submitted 24 October, 2017;
originally announced October 2017.
-
Inductive Conformal Martingales for Change-Point Detection
Authors:
Denis Volkhonskiy,
Ilia Nouretdinov,
Alexander Gammerman,
Vladimir Vovk,
Evgeny Burnaev
Abstract:
We consider the problem of quickest change-point detection in data streams. Classical change-point detection procedures, such as CUSUM, Shiryaev-Roberts and Posterior Probability statistics, are optimal only if the change-point model is known, which is an unrealistic assumption in typical applied problems. Instead we propose a new method for change-point detection based on Inductive Conformal Mart…
▽ More
We consider the problem of quickest change-point detection in data streams. Classical change-point detection procedures, such as CUSUM, Shiryaev-Roberts and Posterior Probability statistics, are optimal only if the change-point model is known, which is an unrealistic assumption in typical applied problems. Instead we propose a new method for change-point detection based on Inductive Conformal Martingales, which requires only the independence and identical distribution of observations. We compare the proposed approach to standard methods, as well as to change-point detection oracles, which model a typical practical situation when we have only imprecise (albeit parametric) information about pre- and post-change data distributions. Results of comparison provide evidence that change-point detection based on Inductive Conformal Martingales is an efficient tool, capable to work under quite general conditions unlike traditional approaches.
△ Less
Submitted 11 June, 2017;
originally announced June 2017.
-
Conformal Predictors for Compound Activity Prediction
Authors:
Paolo Toccacheli,
Ilia Nouretdinov,
Alexander Gammerman
Abstract:
The paper presents an application of Conformal Predictors to a chemoinformatics problem of identifying activities of chemical compounds. The paper addresses some specific challenges of this domain: a large number of compounds (training examples), high-dimensionality of feature space, sparseness and a strong class imbalance. A variant of conformal predictors called Inductive Mondrian Conformal Pred…
▽ More
The paper presents an application of Conformal Predictors to a chemoinformatics problem of identifying activities of chemical compounds. The paper addresses some specific challenges of this domain: a large number of compounds (training examples), high-dimensionality of feature space, sparseness and a strong class imbalance. A variant of conformal predictors called Inductive Mondrian Conformal Predictor is applied to deal with these challenges. Results are presented for several non-conformity measures (NCM) extracted from underlying algorithms and different kernels. A number of performance measures are used in order to demonstrate the flexibility of Inductive Mondrian Conformal Predictors in dealing with such a complex set of data.
Keywords: Conformal Prediction, Confidence Estimation, Chemoinformatics, Non-Conformity Measure.
△ Less
Submitted 14 March, 2016;
originally announced March 2016.
-
Criteria of efficiency for conformal prediction
Authors:
Vladimir Vovk,
Ilia Nouretdinov,
Valentina Fedorova,
Ivan Petej,
Alex Gammerman
Abstract:
We study optimal conformity measures for various criteria of efficiency of classification in an idealised setting. This leads to an important class of criteria of efficiency that we call probabilistic; it turns out that the most standard criteria of efficiency used in literature on conformal prediction are not probabilistic unless the problem of classification is binary. We consider both unconditi…
▽ More
We study optimal conformity measures for various criteria of efficiency of classification in an idealised setting. This leads to an important class of criteria of efficiency that we call probabilistic; it turns out that the most standard criteria of efficiency used in literature on conformal prediction are not probabilistic unless the problem of classification is binary. We consider both unconditional and label-conditional conformal prediction.
△ Less
Submitted 14 September, 2016; v1 submitted 14 March, 2016;
originally announced March 2016.
-
Plug-in martingales for testing exchangeability on-line
Authors:
Valentina Fedorova,
Alex Gammerman,
Ilia Nouretdinov,
Vladimir Vovk
Abstract:
A standard assumption in machine learning is the exchangeability of data, which is equivalent to assuming that the examples are generated from the same probability distribution independently. This paper is devoted to testing the assumption of exchangeability on-line: the examples arrive one by one, and after receiving each example we would like to have a valid measure of the degree to which the as…
▽ More
A standard assumption in machine learning is the exchangeability of data, which is equivalent to assuming that the examples are generated from the same probability distribution independently. This paper is devoted to testing the assumption of exchangeability on-line: the examples arrive one by one, and after receiving each example we would like to have a valid measure of the degree to which the assumption of exchangeability has been falsified. Such measures are provided by exchangeability martingales. We extend known techniques for constructing exchangeability martingales and show that our new method is competitive with the martingales introduced before. Finally we investigate the performance of our testing method on two benchmark datasets, USPS and Statlog Satellite data; for the former, the known techniques give satisfactory results, but for the latter our new more flexible method becomes necessary.
△ Less
Submitted 28 June, 2012; v1 submitted 15 April, 2012;
originally announced April 2012.
-
On-line predictive linear regression
Authors:
Vladimir Vovk,
Ilia Nouretdinov,
Alex Gammerman
Abstract:
We consider the on-line predictive version of the standard problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. The standard treatment of prediction in linear regression analysis has two drawbacks: (1) the classical prediction intervals guarantee that the probability of error is equal to th…
▽ More
We consider the on-line predictive version of the standard problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. The standard treatment of prediction in linear regression analysis has two drawbacks: (1) the classical prediction intervals guarantee that the probability of error is equal to the nominal significance level $\varepsilon$, but this property per se does not imply that the long-run frequency of error is close to $\varepsilon$; (2) it is not suitable for prediction of complex systems as it assumes that the number of observations exceeds the number of parameters. We state a general result showing that in the on-line protocol the frequency of error for the classical prediction intervals does equal the nominal significance level, up to statistical fluctuations. We also describe alternative regression models in which informative prediction intervals can be found before the number of observations exceeds the number of parameters. One of these models, which only assumes that the observations are independent and identically distributed, is popular in machine learning but greatly underused in the statistical theory of regression.
△ Less
Submitted 17 June, 2009;
originally announced June 2009.
-
Online prediction of ovarian cancer
Authors:
Fedor Zhdanov,
Vladimir Vovk,
Brian Burford,
Dmitry Devetyarov,
Ilia Nouretdinov,
Alex Gammerman
Abstract:
In this paper we apply computer learning methods to diagnosing ovarian cancer using the level of the standard biomarker CA125 in conjunction with information provided by mass-spectrometry. We are working with a new data set collected over a period of 7 years. Using the level of CA125 and mass-spectrometry peaks, our algorithm gives probability predictions for the disease. To estimate classificat…
▽ More
In this paper we apply computer learning methods to diagnosing ovarian cancer using the level of the standard biomarker CA125 in conjunction with information provided by mass-spectrometry. We are working with a new data set collected over a period of 7 years. Using the level of CA125 and mass-spectrometry peaks, our algorithm gives probability predictions for the disease. To estimate classification accuracy we convert probability predictions into strict predictions. Our algorithm makes fewer errors than almost any linear combination of the CA125 level and one peak's intensity (taken on the log scale). To check the power of our algorithm we use it to test the hypothesis that CA125 and the peaks do not contain useful information for the prediction of the disease at a particular time before the diagnosis. Our algorithm produces $p$-values that are better than those produced by the algorithm that has been previously applied to this data set. Our conclusion is that the proposed algorithm is more reliable for prediction on new data.
△ Less
Submitted 9 April, 2009;
originally announced April 2009.
-
On-line predictive linear regression
Authors:
Vladimir Vovk,
Ilia Nouretdinov,
Alex Gammerman
Abstract:
We consider the on-line predictive version of the standard problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. We are mainly interested in prediction intervals rather than point predictions. The standard treatment of prediction intervals in linear regression analysis has two drawbacks: (1) t…
▽ More
We consider the on-line predictive version of the standard problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. We are mainly interested in prediction intervals rather than point predictions. The standard treatment of prediction intervals in linear regression analysis has two drawbacks: (1) the classical prediction intervals guarantee that the probability of error is equal to the nominal significance level epsilon, but this property per se does not imply that the long-run frequency of error is close to epsilon; (2) it is not suitable for prediction of complex systems as it assumes that the number of observations exceeds the number of parameters. We state a general result showing that in the on-line protocol the frequency of error for the classical prediction intervals does equal the nominal significance level, up to statistical fluctuations. We also describe alternative regression models in which informative prediction intervals can be found before the number of observations exceeds the number of parameters. One of these models, which only assumes that the observations are independent and identically distributed, is popular in machine learning but greatly underused in the statistical theory of regression.
△ Less
Submitted 21 November, 2011; v1 submitted 21 November, 2005;
originally announced November 2005.
-
Defensive forecasting for linear protocols
Authors:
Vladimir Vovk,
Ilia Nouretdinov,
Akimichi Takemura,
Glenn Shafer
Abstract:
We consider a general class of forecasting protocols, called "linear protocols", and discuss several important special cases, including multi-class forecasting. Forecasting is formalized as a game between three players: Reality, whose role is to generate observations; Forecaster, whose goal is to predict the observations; and Skeptic, who tries to make money on any lack of agreement between Fore…
▽ More
We consider a general class of forecasting protocols, called "linear protocols", and discuss several important special cases, including multi-class forecasting. Forecasting is formalized as a game between three players: Reality, whose role is to generate observations; Forecaster, whose goal is to predict the observations; and Skeptic, who tries to make money on any lack of agreement between Forecaster's predictions and the actual observations. Our main mathematical result is that for any continuous strategy for Skeptic in a linear protocol there exists a strategy for Forecaster that does not allow Skeptic's capital to grow. This result is a meta-theorem that allows one to transform any continuous law of probability in a linear protocol into a forecasting strategy whose predictions are guaranteed to satisfy this law. We apply this meta-theorem to a weak law of large numbers in Hilbert spaces to obtain a version of the K29 prediction algorithm for linear protocols and show that this version also satisfies the attractive properties of proper calibration and resolution under a suitable choice of its kernel parameter, with no assumptions about the way the data is generated.
△ Less
Submitted 24 September, 2005; v1 submitted 2 June, 2005;
originally announced June 2005.