-
Efficient Robust Conformal Prediction via Lipschitz-Bounded Networks
Authors:
Thomas Massena,
Léo andéol,
Thibaut Boissin,
Franck Mamalet,
Corentin Friedrich,
Mathieu Serrurier,
Sébastien Gerchinovitz
Abstract:
Conformal Prediction (CP) has proven to be an effective post-hoc method for improving the trustworthiness of neural networks by providing prediction sets with finite-sample guarantees. However, under adversarial attacks, classical conformal guarantees do not hold anymore: this problem is addressed in the field of Robust Conformal Prediction. Several methods have been proposed to provide robust CP…
▽ More
Conformal Prediction (CP) has proven to be an effective post-hoc method for improving the trustworthiness of neural networks by providing prediction sets with finite-sample guarantees. However, under adversarial attacks, classical conformal guarantees do not hold anymore: this problem is addressed in the field of Robust Conformal Prediction. Several methods have been proposed to provide robust CP sets with guarantees under adversarial perturbations, but, for large scale problems, these sets are either too large or the methods are too computationally demanding to be deployed in real life scenarios. In this work, we propose a new method that leverages Lipschitz-bounded networks to precisely and efficiently estimate robust CP sets. When combined with a 1-Lipschitz robust network, we demonstrate that our lip-rcp method outperforms state-of-the-art results in both the size of the robust CP sets and computational efficiency in medium and large-scale scenarios such as ImageNet. Taking a different angle, we also study vanilla CP under attack, and derive new worst-case coverage bounds of vanilla CP sets, which are valid simultaneously for all adversarial attack levels. Our lip-rcp method makes this second approach as efficient as vanilla CP while also allowing robustness guarantees.
△ Less
Submitted 10 June, 2025; v1 submitted 5 June, 2025;
originally announced June 2025.
-
An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures
Authors:
Thibaut Boissin,
Franck Mamalet,
Thomas Fel,
Agustin Martin Picard,
Thomas Massena,
Mathieu Serrurier
Abstract:
Orthogonal convolutional layers are valuable components in multiple areas of machine learning, such as adversarial robustness, normalizing flows, GANs, and Lipschitz-constrained models. Their ability to preserve norms and ensure stable gradient propagation makes them valuable for a large range of problems. Despite their promise, the deployment of orthogonal convolution in large-scale applications…
▽ More
Orthogonal convolutional layers are valuable components in multiple areas of machine learning, such as adversarial robustness, normalizing flows, GANs, and Lipschitz-constrained models. Their ability to preserve norms and ensure stable gradient propagation makes them valuable for a large range of problems. Despite their promise, the deployment of orthogonal convolution in large-scale applications is a significant challenge due to computational overhead and limited support for modern features like strides, dilations, group convolutions, and transposed convolutions. In this paper, we introduce AOC (Adaptative Orthogonal Convolution), a scalable method that extends a previous method (BCOP), effectively overcoming existing limitations in the construction of orthogonal convolutions. This advancement unlocks the construction of architectures that were previously considered impractical. We demonstrate through our experiments that our method produces expressive models that become increasingly efficient as they scale. To foster further advancement, we provide an open-source python package implementing this method, called Orthogonium ( https://github.com/deel-ai/orthogonium ) .
△ Less
Submitted 4 June, 2025; v1 submitted 14 January, 2025;
originally announced January 2025.
-
DP-SGD Without Clipping: The Lipschitz Neural Network Way
Authors:
Louis Bethune,
Thomas Massena,
Thibaut Boissin,
Yannick Prudent,
Corentin Friedrich,
Franck Mamalet,
Aurelien Bellet,
Mathieu Serrurier,
David Vigouroux
Abstract:
State-of-the-art approaches for training Differentially Private (DP) Deep Neural Networks (DNN) face difficulties to estimate tight bounds on the sensitivity of the network's layers, and instead rely on a process of per-sample gradient clipping. This clipping process not only biases the direction of gradients but also proves costly both in memory consumption and in computation. To provide sensitiv…
▽ More
State-of-the-art approaches for training Differentially Private (DP) Deep Neural Networks (DNN) face difficulties to estimate tight bounds on the sensitivity of the network's layers, and instead rely on a process of per-sample gradient clipping. This clipping process not only biases the direction of gradients but also proves costly both in memory consumption and in computation. To provide sensitivity bounds and bypass the drawbacks of the clipping process, we propose to rely on Lipschitz constrained networks. Our theoretical analysis reveals an unexplored link between the Lipschitz constant with respect to their input and the one with respect to their parameters. By bounding the Lipschitz constant of each layer with respect to its parameters, we prove that we can train these networks with privacy guarantees. Our analysis not only allows the computation of the aforementioned sensitivities at scale, but also provides guidance on how to maximize the gradient-to-noise ratio for fixed privacy guarantees. The code has been released as a Python package available at https://github.com/Algue-Rythme/lip-dp
△ Less
Submitted 22 February, 2024; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Robust One-Class Classification with Signed Distance Function using 1-Lipschitz Neural Networks
Authors:
Louis Bethune,
Paul Novello,
Thibaut Boissin,
Guillaume Coiffier,
Mathieu Serrurier,
Quentin Vincenot,
Andres Troya-Galvis
Abstract:
We propose a new method, dubbed One Class Signed Distance Function (OCSDF), to perform One Class Classification (OCC) by provably learning the Signed Distance Function (SDF) to the boundary of the support of any distribution. The distance to the support can be interpreted as a normality score, and its approximation using 1-Lipschitz neural networks provides robustness bounds against $l2$ adversari…
▽ More
We propose a new method, dubbed One Class Signed Distance Function (OCSDF), to perform One Class Classification (OCC) by provably learning the Signed Distance Function (SDF) to the boundary of the support of any distribution. The distance to the support can be interpreted as a normality score, and its approximation using 1-Lipschitz neural networks provides robustness bounds against $l2$ adversarial attacks, an under-explored weakness of deep learning-based OCC algorithms. As a result, OCSDF comes with a new metric, certified AUROC, that can be computed at the same cost as any classical AUROC. We show that OCSDF is competitive against concurrent methods on tabular and image data while being way more robust to adversarial attacks, illustrating its theoretical properties. Finally, as exploratory research perspectives, we theoretically and empirically show how OCSDF connects OCC with image generation and implicit neural surface parametrization. Our code is available at https://github.com/Algue-Rythme/OneClassMetricLearning
△ Less
Submitted 1 April, 2024; v1 submitted 26 January, 2023;
originally announced March 2023.
-
On the explainable properties of 1-Lipschitz Neural Networks: An Optimal Transport Perspective
Authors:
Mathieu Serrurier,
Franck Mamalet,
Thomas Fel,
Louis Béthune,
Thibaut Boissin
Abstract:
Input gradients have a pivotal role in a variety of applications, including adversarial attack algorithms for evaluating model robustness, explainable AI techniques for generating Saliency Maps, and counterfactual explanations.However, Saliency Maps generated by traditional neural networks are often noisy and provide limited insights. In this paper, we demonstrate that, on the contrary, the Salien…
▽ More
Input gradients have a pivotal role in a variety of applications, including adversarial attack algorithms for evaluating model robustness, explainable AI techniques for generating Saliency Maps, and counterfactual explanations.However, Saliency Maps generated by traditional neural networks are often noisy and provide limited insights. In this paper, we demonstrate that, on the contrary, the Saliency Maps of 1-Lipschitz neural networks, learned with the dual loss of an optimal transportation problem, exhibit desirable XAI properties:They are highly concentrated on the essential parts of the image with low noise, significantly outperforming state-of-the-art explanation approaches across various models and metrics. We also prove that these maps align unprecedentedly well with human explanations on ImageNet.To explain the particularly beneficial properties of the Saliency Map for such models, we prove this gradient encodes both the direction of the transportation plan and the direction towards the nearest adversarial attack. Following the gradient down to the decision boundary is no longer considered an adversarial attack, but rather a counterfactual explanation that explicitly transports the input from one class to another. Thus, Learning with such a loss jointly optimizes the classification objective and the alignment of the gradient, i.e. the Saliency Map, to the transportation plan direction.These networks were previously known to be certifiably robust by design, and we demonstrate that they scale well for large problems and models, and are tailored for explainability using a fast and straightforward method.
△ Less
Submitted 2 February, 2024; v1 submitted 14 June, 2022;
originally announced June 2022.
-
Precipitaion Nowcasting using Deep Neural Network
Authors:
Mohamed Chafik Bakkay,
Mathieu Serrurier,
Valentin Kivachuk Burda,
Florian Dupuy,
Naty Citlali Cabrera-Gutierrez,
Michael Zamo,
Maud-Alix Mader,
Olivier Mestre,
Guillaume Oller,
Jean-Christophe Jouhaud,
Laurent Terray
Abstract:
Precipitation nowcasting is of great importance for weather forecast users, for activities ranging from outdoor activities and sports competitions to airport traffic management. In contrast to long-term precipitation forecasts which are traditionally obtained from numerical models, precipitation nowcasting needs to be very fast. It is therefore more challenging to obtain because of this time const…
▽ More
Precipitation nowcasting is of great importance for weather forecast users, for activities ranging from outdoor activities and sports competitions to airport traffic management. In contrast to long-term precipitation forecasts which are traditionally obtained from numerical models, precipitation nowcasting needs to be very fast. It is therefore more challenging to obtain because of this time constraint. Recently, many machine learning based methods had been proposed. We propose the use three popular deep learning models (U-net, ConvLSTM and SVG-LP) trained on two-dimensional precipitation maps for precipitation nowcasting. We proposed an algorithm for patch extraction to obtain high resolution precipitation maps. We proposed a loss function to solve the blurry image issue and to reduce the influence of zero value pixels in precipitation maps.
△ Less
Submitted 24 March, 2022;
originally announced March 2022.
-
Pay attention to your loss: understanding misconceptions about 1-Lipschitz neural networks
Authors:
Louis Béthune,
Thibaut Boissin,
Mathieu Serrurier,
Franck Mamalet,
Corentin Friedrich,
Alberto González-Sanz
Abstract:
Lipschitz constrained networks have gathered considerable attention in the deep learning community, with usages ranging from Wasserstein distance estimation to the training of certifiably robust classifiers. However they remain commonly considered as less accurate, and their properties in learning are still not fully understood. In this paper we clarify the matter: when it comes to classification…
▽ More
Lipschitz constrained networks have gathered considerable attention in the deep learning community, with usages ranging from Wasserstein distance estimation to the training of certifiably robust classifiers. However they remain commonly considered as less accurate, and their properties in learning are still not fully understood. In this paper we clarify the matter: when it comes to classification 1-Lipschitz neural networks enjoy several advantages over their unconstrained counterpart. First, we show that these networks are as accurate as classical ones, and can fit arbitrarily difficult boundaries. Then, relying on a robustness metric that reflects operational needs we characterize the most robust classifier: the WGAN discriminator. Next, we show that 1-Lipschitz neural networks generalize well under milder assumptions. Finally, we show that hyper-parameters of the loss are crucial for controlling the accuracy-robustness trade-off. We conclude that they exhibit appealing properties to pave the way toward provably accurate, and provably robust neural networks.
△ Less
Submitted 17 October, 2022; v1 submitted 11 April, 2021;
originally announced April 2021.
-
A Hölderian backtracking method for min-max and min-min problems
Authors:
Jérôme Bolte,
Lilian Glaudin,
Edouard Pauwels,
Mathieu Serrurier
Abstract:
We present a new algorithm to solve min-max or min-min problems out of the convex world. We use rigidity assumptions, ubiquitous in learning, making our method applicable to many optimization problems. Our approach takes advantage of hidden regularity properties and allows us to devise a simple algorithm of ridge type. An original feature of our method is to come with automatic step size adaptatio…
▽ More
We present a new algorithm to solve min-max or min-min problems out of the convex world. We use rigidity assumptions, ubiquitous in learning, making our method applicable to many optimization problems. Our approach takes advantage of hidden regularity properties and allows us to devise a simple algorithm of ridge type. An original feature of our method is to come with automatic step size adaptation which departs from the usual overly cautious backtracking methods. In a general framework, we provide convergence theoretical guarantees and rates. We apply our findings on simple GAN problems obtaining promising numerical results.
△ Less
Submitted 17 July, 2020;
originally announced July 2020.
-
ARPEGE Cloud Cover Forecast Post-Processing with Convolutional Neural Network
Authors:
Florian Dupuy,
Olivier Mestre,
Mathieu Serrurier,
Mohamed Chafik Bakkay,
Valentin Kivachuk Burdá,
Naty Citlali Cabrera-Gutiérrez,
Jean-Christophe Jouhaud,
Maud-Alix Mader,
Guillaume Oller,
Michaël Zamo
Abstract:
Cloud cover is crucial information for many applications such as planning land observation missions from space. It remains however a challenging variable to forecast, and Numerical Weather Prediction (NWP) models suffer from significant biases, hence justifying the use of statistical post-processing techniques. In this study, ARPEGE (Météo-France global NWP) cloud cover is post-processed using a c…
▽ More
Cloud cover is crucial information for many applications such as planning land observation missions from space. It remains however a challenging variable to forecast, and Numerical Weather Prediction (NWP) models suffer from significant biases, hence justifying the use of statistical post-processing techniques. In this study, ARPEGE (Météo-France global NWP) cloud cover is post-processed using a convolutional neural network (CNN). CNN is the most popular machine learning tool to deal with images. In our case, CNN allows the integration of spatial information contained in NWP outputs. We use a gridded cloud cover product derived from satellite observations over Europe as ground truth, and predictors are spatial fields of various variables produced by ARPEGE at the corresponding lead time. We show that a simple U-Net architecture produces significant improvements over Europe. Moreover, the U-Net outclasses more traditional machine learning methods used operationally such as a random forest and a logistic quantile regression. We introduced a weighting predictor layer prior to the traditional U-Net architecture which produces a ranking of predictors by importance, facilitating the interpretation of the results. Using $N$ predictors, only $N$ additional weights are trained which does not impact the computational time, representing a huge advantage compared to traditional methods of ranking (permutation importance, sequential selection, ...).
△ Less
Submitted 30 June, 2020;
originally announced June 2020.
-
Surrogate Models for Rainfall Nowcasting
Authors:
Naty Citlali Cabrera-Gutiérrez,
Hadrien Godé,
Jean-Christophe Jouhaud,
Mohamed Chafik Bakkay,
Valentin Kivachuk Burdá,
Florian Dupuy,
Maud-Alix Mader,
Olivier Mestre,
Guillaume Oller,
Mathieu Serrurier,
Michaël Zamo
Abstract:
Nowcasting (or short-term weather forecasting) is particularly important in the case of extreme events as it helps prevent human losses. Many of our activities, however, also depend on the weather. Therefore, nowcasting has shown to be useful in many different domains. Currently, immediate rainfall forecasts in France are calculated using the Arome-NWC model developed by Météo-France, which is a c…
▽ More
Nowcasting (or short-term weather forecasting) is particularly important in the case of extreme events as it helps prevent human losses. Many of our activities, however, also depend on the weather. Therefore, nowcasting has shown to be useful in many different domains. Currently, immediate rainfall forecasts in France are calculated using the Arome-NWC model developed by Météo-France, which is a complex physical model. Arome-NWC forecasts are stored with a 15 minute time interval. A higher time resolution is, however, desirable for other meteorological applications. Complex model calculations, such as Arome-NWC, can be very expensive and time consuming. A surrogate model aims at producing results which are very close to the ones obtained using a complex model, but with largely reduced calculation times. Building a surrogate model requires only a few calculations with the real model. Once the surrogate model is built, further calculations can be quickly realized. In this study, we propose to build surrogate models for immediate rainfall forecasts with two different approaches: combining Proper Orthogonal Decomposition (POD) and Kriging, or combining POD and Random Forest (RF). We show that results obtained with our surrogate models are not only close to the ones obtained by Arome-NWC, but they also have a higher time resolution (1 minute) with a reduced calculation time.
△ Less
Submitted 25 June, 2020;
originally announced June 2020.
-
Achieving robustness in classification using optimal transport with hinge regularization
Authors:
Mathieu Serrurier,
Franck Mamalet,
Alberto González-Sanz,
Thibaut Boissin,
Jean-Michel Loubes,
Eustasio del Barrio
Abstract:
Adversarial examples have pointed out Deep Neural Networks vulnerability to small local noise. It has been shown that constraining their Lipschitz constant should enhance robustness, but make them harder to learn with classical loss functions. We propose a new framework for binary classification, based on optimal transport, which integrates this Lipschitz constraint as a theoretical requirement. W…
▽ More
Adversarial examples have pointed out Deep Neural Networks vulnerability to small local noise. It has been shown that constraining their Lipschitz constant should enhance robustness, but make them harder to learn with classical loss functions. We propose a new framework for binary classification, based on optimal transport, which integrates this Lipschitz constraint as a theoretical requirement. We propose to learn 1-Lipschitz networks using a new loss that is an hinge regularized version of the Kantorovich-Rubinstein dual formulation for the Wasserstein distance estimation. This loss function has a direct interpretation in terms of adversarial robustness together with certifiable robustness bound. We also prove that this hinge regularized version is still the dual formulation of an optimal transportation problem, and has a solution. We also establish several geometrical properties of this optimal solution, and extend the approach to multi-class problems. Experiments show that the proposed approach provides the expected guarantees in terms of robustness without any significant accuracy drop. The adversarial examples, on the proposed models, visibly and meaningfully change the input providing an explanation for the classification.
△ Less
Submitted 26 April, 2021; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Dyslexia and Dysgraphia prediction: A new machine learning approach
Authors:
Gilles Richard,
Mathieu Serrurier
Abstract:
Learning disabilities like dysgraphia, dyslexia, dyspraxia, etc. interfere with academic achievements but have also long terms consequences beyond the academic time. It is widely admitted that between 5% to 10% of the world population is subject to this kind of disabilities. For assessing such disabilities in early childhood, children have to solve a battery of tests. Human experts score these tes…
▽ More
Learning disabilities like dysgraphia, dyslexia, dyspraxia, etc. interfere with academic achievements but have also long terms consequences beyond the academic time. It is widely admitted that between 5% to 10% of the world population is subject to this kind of disabilities. For assessing such disabilities in early childhood, children have to solve a battery of tests. Human experts score these tests, and decide whether the children require specific education strategy on the basis of their marks. The assessment can be lengthy, costly and emotionally painful. In this paper, we investigate how Artificial Intelligence can help in automating this assessment. Gathering a dataset of handwritten text pictures and audio recordings, both from standard children and from dyslexic and/or dysgraphic children, we apply machine learning techniques for classification in order to analyze the differences between dyslexic/dysgraphic and standard readers/writers and to build a model. The model is trained on simple features obtained by analysing the pictures and the audio files. Our preliminary implementation shows relatively high performances on the dataset we have used. This suggests the possibility to screen dyslexia and dysgraphia via non-invasive methods in an accurate way as soon as enough data are available.
△ Less
Submitted 15 April, 2020;
originally announced May 2020.
-
Estimation of conditional mixture Weibull distribution with right-censored data using neural network for time-to-event analysis
Authors:
Achraf Bennis,
Sandrine Mouysset,
Mathieu Serrurier
Abstract:
In this paper, we consider survival analysis with right-censored data which is a common situation in predictive maintenance and health field. We propose a model based on the estimation of two-parameter Weibull distribution conditionally to the features. To achieve this result, we describe a neural network architecture and the associated loss functions that takes into account the right-censored dat…
▽ More
In this paper, we consider survival analysis with right-censored data which is a common situation in predictive maintenance and health field. We propose a model based on the estimation of two-parameter Weibull distribution conditionally to the features. To achieve this result, we describe a neural network architecture and the associated loss functions that takes into account the right-censored data. We extend the approach to a finite mixture of two-parameter Weibull distributions. We first validate that our model is able to precisely estimate the right parameters of the conditional Weibull distribution on synthetic datasets. In numerical experiments on two real-word datasets (METABRIC and SEER), our model outperforms the state-of-the-art methods. We also demonstrate that our approach can consider any survival time horizon.
△ Less
Submitted 21 February, 2020;
originally announced February 2020.
-
From Shallow to Deep Interactions Between Knowledge Representation, Reasoning and Machine Learning (Kay R. Amel group)
Authors:
Zied Bouraoui,
Antoine Cornuéjols,
Thierry Denœux,
Sébastien Destercke,
Didier Dubois,
Romain Guillaume,
João Marques-Silva,
Jérôme Mengin,
Henri Prade,
Steven Schockaert,
Mathieu Serrurier,
Christel Vrain
Abstract:
This paper proposes a tentative and original survey of meeting points between Knowledge Representation and Reasoning (KRR) and Machine Learning (ML), two areas which have been developing quite separately in the last three decades. Some common concerns are identified and discussed such as the types of used representation, the roles of knowledge and data, the lack or the excess of information, or th…
▽ More
This paper proposes a tentative and original survey of meeting points between Knowledge Representation and Reasoning (KRR) and Machine Learning (ML), two areas which have been developing quite separately in the last three decades. Some common concerns are identified and discussed such as the types of used representation, the roles of knowledge and data, the lack or the excess of information, or the need for explanations and causal understanding. Then some methodologies combining reasoning and learning are reviewed (such as inductive logic programming, neuro-symbolic reasoning, formal concept analysis, rule-based representations and ML, uncertainty in ML, or case-based reasoning and analogical reasoning), before discussing examples of synergies between KRR and ML (including topics such as belief functions on regression, EM algorithm versus revision, the semantic description of vector representations, the combination of deep learning with high level inference, knowledge graph completion, declarative frameworks for data mining, or preferences and recommendation). This paper is the first step of a work in progress aiming at a better mutual understanding of research in KRR and ML, and how they could cooperate.
△ Less
Submitted 13 December, 2019;
originally announced December 2019.
-
Learning Disentangled Representations via Mutual Information Estimation
Authors:
Eduardo Hugo Sanchez,
Mathieu Serrurier,
Mathias Ortner
Abstract:
In this paper, we investigate the problem of learning disentangled representations. Given a pair of images sharing some attributes, we aim to create a low-dimensional representation which is split into two parts: a shared representation that captures the common information between the images and an exclusive representation that contains the specific information of each image. To address this issue…
▽ More
In this paper, we investigate the problem of learning disentangled representations. Given a pair of images sharing some attributes, we aim to create a low-dimensional representation which is split into two parts: a shared representation that captures the common information between the images and an exclusive representation that contains the specific information of each image. To address this issue, we propose a model based on mutual information estimation without relying on image reconstruction or image generation. Mutual information maximization is performed to capture the attributes of data in the shared and exclusive representations while we minimize the mutual information between the shared and exclusive representation to enforce representation disentanglement. We show that these representations are useful to perform downstream tasks such as image classification and image retrieval based on the shared or exclusive component. Moreover, classification results show that our model outperforms the state-of-the-art model based on VAE/GAN approaches in representation disentanglement.
△ Less
Submitted 9 December, 2019;
originally announced December 2019.
-
Learning Disentangled Representations of Satellite Image Time Series
Authors:
Eduardo Sanchez,
Mathieu Serrurier,
Mathias Ortner
Abstract:
In this paper, we investigate how to learn a suitable representation of satellite image time series in an unsupervised manner by leveraging large amounts of unlabeled data. Additionally , we aim to disentangle the representation of time series into two representations: a shared representation that captures the common information between the images of a time series and an exclusive representation t…
▽ More
In this paper, we investigate how to learn a suitable representation of satellite image time series in an unsupervised manner by leveraging large amounts of unlabeled data. Additionally , we aim to disentangle the representation of time series into two representations: a shared representation that captures the common information between the images of a time series and an exclusive representation that contains the specific information of each image of the time series. To address these issues, we propose a model that combines a novel component called cross-domain autoencoders with the variational autoencoder (VAE) and generative ad-versarial network (GAN) methods. In order to learn disentangled representations of time series, our model learns the multimodal image-to-image translation task. We train our model using satellite image time series from the Sentinel-2 mission. Several experiments are carried out to evaluate the obtained representations. We show that these disentangled representations can be very useful to perform multiple tasks such as image classification, image retrieval, image segmentation and change detection.
△ Less
Submitted 21 March, 2019;
originally announced March 2019.
-
Predictive Interval Models for Non-parametric Regression
Authors:
Mohammad Ghasemi Hamed,
Mathieu Serrurier,
Nicolas Durand
Abstract:
Having a regression model, we are interested in finding two-sided intervals that are guaranteed to contain at least a desired proportion of the conditional distribution of the response variable given a specific combination of predictors. We name such intervals predictive intervals. This work presents a new method to find two-sided predictive intervals for non-parametric least squares regression wi…
▽ More
Having a regression model, we are interested in finding two-sided intervals that are guaranteed to contain at least a desired proportion of the conditional distribution of the response variable given a specific combination of predictors. We name such intervals predictive intervals. This work presents a new method to find two-sided predictive intervals for non-parametric least squares regression without the homoscedasticity assumption. Our predictive intervals are built by using tolerance intervals on prediction errors in the query point's neighborhood. We proposed a predictive interval model test and we also used it as a constraint in our hyper-parameter tuning algorithm. This gives an algorithm that finds the smallest reliable predictive intervals for a given dataset. We also introduce a measure for comparing different interval prediction methods yielding intervals having different size and coverage. These experiments show that our methods are more reliable, effective and precise than other interval prediction methods.
△ Less
Submitted 21 March, 2016; v1 submitted 24 February, 2014;
originally announced February 2014.