Search | arXiv e-print repository

Validity and efficiency of the conformal CUSUM procedure

Authors: Vladimir Vovk, Ilia Nouretdinov, Alex Gammerman

Abstract: In this paper we study the validity and efficiency of a conformal version of the CUSUM procedure for change detection both experimentally and theoretically. In this paper we study the validity and efficiency of a conformal version of the CUSUM procedure for change detection both experimentally and theoretically. △ Less

Submitted 4 December, 2024; originally announced December 2024.

Comments: 19 pages, 7 figures

MSC Class: 62G10 (Primary) 68T05; 68Q32; 62L10 (Secondary)

arXiv:2111.01885 [pdf, other]

Conformal testing: binary case with Markov alternatives

Authors: Vladimir Vovk, Ilia Nouretdinov, Alex Gammerman

Abstract: We continue study of conformal testing in binary model situations. In this note we consider Markov alternatives to the null hypothesis of exchangeability. We propose two new classes of conformal test martingales; one class is statistically efficient in our experiments, and the other class partially sacrifices statistical efficiency to gain computational efficiency. We continue study of conformal testing in binary model situations. In this note we consider Markov alternatives to the null hypothesis of exchangeability. We propose two new classes of conformal test martingales; one class is statistically efficient in our experiments, and the other class partially sacrifices statistical efficiency to gain computational efficiency. △ Less

Submitted 2 November, 2021; originally announced November 2021.

Comments: 8 pages, 8 figures

MSC Class: 68Q32 (Primary) 62G10; 60G42 (Secondary)

arXiv:2102.10439 [pdf, other]

Retrain or not retrain: Conformal test martingales for change-point detection

Authors: Vladimir Vovk, Ivan Petej, Ilia Nouretdinov, Ernst Ahlberg, Lars Carlsson, Alex Gammerman

Abstract: We argue for supplementing the process of training a prediction algorithm by setting up a scheme for detecting the moment when the distribution of the data changes and the algorithm needs to be retrained. Our proposed schemes are based on exchangeability martingales, i.e., processes that are martingales under any exchangeable distribution for the data. Our method, based on conformal prediction, is… ▽ More We argue for supplementing the process of training a prediction algorithm by setting up a scheme for detecting the moment when the distribution of the data changes and the algorithm needs to be retrained. Our proposed schemes are based on exchangeability martingales, i.e., processes that are martingales under any exchangeable distribution for the data. Our method, based on conformal prediction, is general and can be applied on top of any modern prediction algorithm. Its validity is guaranteed, and in this paper we make first steps in exploring its efficiency. △ Less

Submitted 20 February, 2021; originally announced February 2021.

Comments: 22 pages, 19 figures, 3 tables

MSC Class: 68Q32 (Primary) 62G10; 60G42; 68T05 (Secondary)

arXiv:2006.02329 [pdf, other]

Conformal e-testing

Authors: Vladimir Vovk, Ilia Nouretdinov, Alex Gammerman

Abstract: There is a useful counterpart of conformal prediction for e-values, called conformal e-prediction. Conformal prediction can serve as basis for testing the assumption of exchangeability, leading to conformal testing. Similarly, conformal e-prediction can also serve as basis for testing. The resulting conformal e-testing looks very different from but inherits some strengths of conformal testing; it… ▽ More There is a useful counterpart of conformal prediction for e-values, called conformal e-prediction. Conformal prediction can serve as basis for testing the assumption of exchangeability, leading to conformal testing. Similarly, conformal e-prediction can also serve as basis for testing. The resulting conformal e-testing looks very different from but inherits some strengths of conformal testing; it even has some advantages over conformal testing. In this paper we discuss systematically both strengths and limitations of conformal e-testing. △ Less

Submitted 2 November, 2024; v1 submitted 3 June, 2020; originally announced June 2020.

Comments: 21 pages and 2 figures

MSC Class: 62G10 (Primary) 68T05; 68Q32; 62L10 (Secondary)

arXiv:1911.00941 [pdf, other]

Computationally efficient versions of conformal predictive distributions

Authors: Vladimir Vovk, Ivan Petej, Ilia Nouretdinov, Valery Manokhin, Alex Gammerman

Abstract: Conformal predictive systems are a recent modification of conformal predictors that output, in regression problems, probability distributions for labels of test observations rather than set predictions. The extra information provided by conformal predictive systems may be useful, e.g., in decision making problems. Conformal predictive systems inherit the relative computational inefficiency of conf… ▽ More Conformal predictive systems are a recent modification of conformal predictors that output, in regression problems, probability distributions for labels of test observations rather than set predictions. The extra information provided by conformal predictive systems may be useful, e.g., in decision making problems. Conformal predictive systems inherit the relative computational inefficiency of conformal predictors. In this paper we discuss two computationally efficient versions of conformal predictive systems, which we call split conformal predictive systems and cross-conformal predictive systems. The main advantage of split conformal predictive systems is their guaranteed validity, whereas for cross-conformal predictive systems validity only holds empirically and in the absence of excessive randomization. The main advantage of cross-conformal predictive systems is their greater predictive efficiency. △ Less

Submitted 3 November, 2019; originally announced November 2019.

Comments: 31 pages, 14 figures, 1 table. The conference version published in the Proceedings of COPA 2018, and the journal version is to appear in Neurocomputing

MSC Class: 68T05

arXiv:1910.08105 [pdf, other]

doi 10.1016/j.neucom.2019.07.114

Multi-level conformal clustering: A distribution-free technique for clustering and anomaly detection

Authors: Ilia Nouretdinov, James Gammerman, Matteo Fontana, Daljit Rehal

Abstract: In this work we present a clustering technique called \textit{multi-level conformal clustering (MLCC)}. The technique is hierarchical in nature because it can be performed at multiple significance levels which yields greater insight into the data than performing it at just one level. We describe the theoretical underpinnings of MLCC, compare and contrast it with the hierarchical clustering algorit… ▽ More In this work we present a clustering technique called \textit{multi-level conformal clustering (MLCC)}. The technique is hierarchical in nature because it can be performed at multiple significance levels which yields greater insight into the data than performing it at just one level. We describe the theoretical underpinnings of MLCC, compare and contrast it with the hierarchical clustering algorithm, and then apply it to real world datasets to assess its performance. There are several advantages to using MLCC over more classical clustering techniques: Once a significance level has been set, MLCC is able to automatically select the number of clusters. Furthermore, thanks to the conformal prediction framework the resulting clustering model has a clear statistical meaning without any assumptions about the distribution of the data. This statistical robustness also allows us to perform clustering and anomaly detection simultaneously. Moreover, due to the flexibility of the conformal prediction framework, our algorithm can be used on top of many other machine learning algorithms. △ Less

Submitted 21 October, 2019; v1 submitted 17 October, 2019; originally announced October 2019.

arXiv:1710.08894 [pdf, other]

Conformal predictive distributions with kernels

Authors: Vladimir Vovk, Ilia Nouretdinov, Valery Manokhin, Alex Gammerman

Abstract: This paper reviews the checkered history of predictive distributions in statistics and discusses two developments, one from recent literature and the other new. The first development is bringing predictive distributions into machine learning, whose early development was so deeply influenced by two remarkable groups at the Institute of Automation and Remote Control. The second development is combin… ▽ More This paper reviews the checkered history of predictive distributions in statistics and discusses two developments, one from recent literature and the other new. The first development is bringing predictive distributions into machine learning, whose early development was so deeply influenced by two remarkable groups at the Institute of Automation and Remote Control. The second development is combining predictive distributions with kernel methods, which were originated by one of those groups, including Emmanuel Braverman. △ Less

Submitted 24 October, 2017; originally announced October 2017.

Comments: 20 pages, 3 figures, prepared for the Proceedings of the Braverman Readings (Boston, 28-30 April 2017)

MSC Class: 68Q32 (Primary) 68T05; 62M20; 60G25; 62J07; 62G08; 62F15 (Secondary)

arXiv:1706.03415 [pdf, other]

Inductive Conformal Martingales for Change-Point Detection

Authors: Denis Volkhonskiy, Ilia Nouretdinov, Alexander Gammerman, Vladimir Vovk, Evgeny Burnaev

Abstract: We consider the problem of quickest change-point detection in data streams. Classical change-point detection procedures, such as CUSUM, Shiryaev-Roberts and Posterior Probability statistics, are optimal only if the change-point model is known, which is an unrealistic assumption in typical applied problems. Instead we propose a new method for change-point detection based on Inductive Conformal Mart… ▽ More We consider the problem of quickest change-point detection in data streams. Classical change-point detection procedures, such as CUSUM, Shiryaev-Roberts and Posterior Probability statistics, are optimal only if the change-point model is known, which is an unrealistic assumption in typical applied problems. Instead we propose a new method for change-point detection based on Inductive Conformal Martingales, which requires only the independence and identical distribution of observations. We compare the proposed approach to standard methods, as well as to change-point detection oracles, which model a typical practical situation when we have only imprecise (albeit parametric) information about pre- and post-change data distributions. Results of comparison provide evidence that change-point detection based on Inductive Conformal Martingales is an efficient tool, capable to work under quite general conditions unlike traditional approaches. △ Less

Submitted 11 June, 2017; originally announced June 2017.

Comments: 22 pages, 9 figures, 5 tables

arXiv:1603.04506 [pdf, other]

Conformal Predictors for Compound Activity Prediction

Authors: Paolo Toccacheli, Ilia Nouretdinov, Alexander Gammerman

Abstract: The paper presents an application of Conformal Predictors to a chemoinformatics problem of identifying activities of chemical compounds. The paper addresses some specific challenges of this domain: a large number of compounds (training examples), high-dimensionality of feature space, sparseness and a strong class imbalance. A variant of conformal predictors called Inductive Mondrian Conformal Pred… ▽ More The paper presents an application of Conformal Predictors to a chemoinformatics problem of identifying activities of chemical compounds. The paper addresses some specific challenges of this domain: a large number of compounds (training examples), high-dimensionality of feature space, sparseness and a strong class imbalance. A variant of conformal predictors called Inductive Mondrian Conformal Predictor is applied to deal with these challenges. Results are presented for several non-conformity measures (NCM) extracted from underlying algorithms and different kernels. A number of performance measures are used in order to demonstrate the flexibility of Inductive Mondrian Conformal Predictors in dealing with such a complex set of data. Keywords: Conformal Prediction, Confidence Estimation, Chemoinformatics, Non-Conformity Measure. △ Less

Submitted 14 March, 2016; originally announced March 2016.

Comments: 17 pages, 5 figures

arXiv:1603.04416 [pdf, other]

Criteria of efficiency for conformal prediction

Authors: Vladimir Vovk, Ilia Nouretdinov, Valentina Fedorova, Ivan Petej, Alex Gammerman

Abstract: We study optimal conformity measures for various criteria of efficiency of classification in an idealised setting. This leads to an important class of criteria of efficiency that we call probabilistic; it turns out that the most standard criteria of efficiency used in literature on conformal prediction are not probabilistic unless the problem of classification is binary. We consider both unconditi… ▽ More We study optimal conformity measures for various criteria of efficiency of classification in an idealised setting. This leads to an important class of criteria of efficiency that we call probabilistic; it turns out that the most standard criteria of efficiency used in literature on conformal prediction are not probabilistic unless the problem of classification is binary. We consider both unconditional and label-conditional conformal prediction. △ Less

Submitted 14 September, 2016; v1 submitted 14 March, 2016; originally announced March 2016.

Comments: 31 pages

MSC Class: 68T05 ACM Class: I.2.6

arXiv:1204.3251 [pdf, other]

Plug-in martingales for testing exchangeability on-line

Authors: Valentina Fedorova, Alex Gammerman, Ilia Nouretdinov, Vladimir Vovk

Abstract: A standard assumption in machine learning is the exchangeability of data, which is equivalent to assuming that the examples are generated from the same probability distribution independently. This paper is devoted to testing the assumption of exchangeability on-line: the examples arrive one by one, and after receiving each example we would like to have a valid measure of the degree to which the as… ▽ More A standard assumption in machine learning is the exchangeability of data, which is equivalent to assuming that the examples are generated from the same probability distribution independently. This paper is devoted to testing the assumption of exchangeability on-line: the examples arrive one by one, and after receiving each example we would like to have a valid measure of the degree to which the assumption of exchangeability has been falsified. Such measures are provided by exchangeability martingales. We extend known techniques for constructing exchangeability martingales and show that our new method is competitive with the martingales introduced before. Finally we investigate the performance of our testing method on two benchmark datasets, USPS and Statlog Satellite data; for the former, the known techniques give satisfactory results, but for the latter our new more flexible method becomes necessary. △ Less

Submitted 28 June, 2012; v1 submitted 15 April, 2012; originally announced April 2012.

Comments: 8 pages, 7 figures; ICML 2012 Conference Proceedings

Report number: On-line Compression Modelling Project (New Series), Working Paper 04 MSC Class: 62G10 ACM Class: I.2.6

arXiv:0906.3123 [pdf, ps, other]

doi 10.1214/08-AOS622

On-line predictive linear regression

Authors: Vladimir Vovk, Ilia Nouretdinov, Alex Gammerman

Abstract: We consider the on-line predictive version of the standard problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. The standard treatment of prediction in linear regression analysis has two drawbacks: (1) the classical prediction intervals guarantee that the probability of error is equal to th… ▽ More We consider the on-line predictive version of the standard problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. The standard treatment of prediction in linear regression analysis has two drawbacks: (1) the classical prediction intervals guarantee that the probability of error is equal to the nominal significance level $\varepsilon$, but this property per se does not imply that the long-run frequency of error is close to $\varepsilon$; (2) it is not suitable for prediction of complex systems as it assumes that the number of observations exceeds the number of parameters. We state a general result showing that in the on-line protocol the frequency of error for the classical prediction intervals does equal the nominal significance level, up to statistical fluctuations. We also describe alternative regression models in which informative prediction intervals can be found before the number of observations exceeds the number of parameters. One of these models, which only assumes that the observations are independent and identically distributed, is popular in machine learning but greatly underused in the statistical theory of regression. △ Less

Submitted 17 June, 2009; originally announced June 2009.

Comments: Published in at http://dx.doi.org/10.1214/08-AOS622 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS622 MSC Class: 62J05; 62G08 (Primary) 60G25; 68Q32 (Secondary)

Journal ref: Annals of Statistics 2009, Vol. 37, No. 3, 1566-1590

arXiv:0904.1579 [pdf, ps, other]

Online prediction of ovarian cancer

Authors: Fedor Zhdanov, Vladimir Vovk, Brian Burford, Dmitry Devetyarov, Ilia Nouretdinov, Alex Gammerman

Abstract: In this paper we apply computer learning methods to diagnosing ovarian cancer using the level of the standard biomarker CA125 in conjunction with information provided by mass-spectrometry. We are working with a new data set collected over a period of 7 years. Using the level of CA125 and mass-spectrometry peaks, our algorithm gives probability predictions for the disease. To estimate classificat… ▽ More In this paper we apply computer learning methods to diagnosing ovarian cancer using the level of the standard biomarker CA125 in conjunction with information provided by mass-spectrometry. We are working with a new data set collected over a period of 7 years. Using the level of CA125 and mass-spectrometry peaks, our algorithm gives probability predictions for the disease. To estimate classification accuracy we convert probability predictions into strict predictions. Our algorithm makes fewer errors than almost any linear combination of the CA125 level and one peak's intensity (taken on the log scale). To check the power of our algorithm we use it to test the hypothesis that CA125 and the peaks do not contain useful information for the prediction of the disease at a particular time before the diagnosis. Our algorithm produces $p$-values that are better than those produced by the algorithm that has been previously applied to this data set. Our conclusion is that the proposed algorithm is more reliable for prediction on new data. △ Less

Submitted 9 April, 2009; originally announced April 2009.

Comments: 11 pages, 4 figures, uses llncs.cls

ACM Class: I.2.1

arXiv:math/0511522 [pdf, ps, other]

doi 10.1214/08-AOS622

On-line predictive linear regression

Authors: Vladimir Vovk, Ilia Nouretdinov, Alex Gammerman

Abstract: We consider the on-line predictive version of the standard problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. We are mainly interested in prediction intervals rather than point predictions. The standard treatment of prediction intervals in linear regression analysis has two drawbacks: (1) t… ▽ More We consider the on-line predictive version of the standard problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. We are mainly interested in prediction intervals rather than point predictions. The standard treatment of prediction intervals in linear regression analysis has two drawbacks: (1) the classical prediction intervals guarantee that the probability of error is equal to the nominal significance level epsilon, but this property per se does not imply that the long-run frequency of error is close to epsilon; (2) it is not suitable for prediction of complex systems as it assumes that the number of observations exceeds the number of parameters. We state a general result showing that in the on-line protocol the frequency of error for the classical prediction intervals does equal the nominal significance level, up to statistical fluctuations. We also describe alternative regression models in which informative prediction intervals can be found before the number of observations exceeds the number of parameters. One of these models, which only assumes that the observations are independent and identically distributed, is popular in machine learning but greatly underused in the statistical theory of regression. △ Less

Submitted 21 November, 2011; v1 submitted 21 November, 2005; originally announced November 2005.

Comments: 34 pages; 6 figures; 1 table. arXiv admin note: substantial text overlap with arXiv:0906.3123

Report number: On-line Compression Modelling Project (New Series), Working Paper 01 MSC Class: 62G08; 62J07

Journal ref: Annals of Statistics 37:1566-1590 (2009)

arXiv:cs/0506007 [pdf, ps, other]

Defensive forecasting for linear protocols

Authors: Vladimir Vovk, Ilia Nouretdinov, Akimichi Takemura, Glenn Shafer

Abstract: We consider a general class of forecasting protocols, called "linear protocols", and discuss several important special cases, including multi-class forecasting. Forecasting is formalized as a game between three players: Reality, whose role is to generate observations; Forecaster, whose goal is to predict the observations; and Skeptic, who tries to make money on any lack of agreement between Fore… ▽ More We consider a general class of forecasting protocols, called "linear protocols", and discuss several important special cases, including multi-class forecasting. Forecasting is formalized as a game between three players: Reality, whose role is to generate observations; Forecaster, whose goal is to predict the observations; and Skeptic, who tries to make money on any lack of agreement between Forecaster's predictions and the actual observations. Our main mathematical result is that for any continuous strategy for Skeptic in a linear protocol there exists a strategy for Forecaster that does not allow Skeptic's capital to grow. This result is a meta-theorem that allows one to transform any continuous law of probability in a linear protocol into a forecasting strategy whose predictions are guaranteed to satisfy this law. We apply this meta-theorem to a weak law of large numbers in Hilbert spaces to obtain a version of the K29 prediction algorithm for linear protocols and show that this version also satisfies the attractive properties of proper calibration and resolution under a suitable choice of its kernel parameter, with no assumptions about the way the data is generated. △ Less

Submitted 24 September, 2005; v1 submitted 2 June, 2005; originally announced June 2005.

Comments: 16 pages

ACM Class: I.2.6; I.5.1

Showing 1–15 of 15 results for author: Nouretdinov, I