Search | arXiv e-print repository

On the quality of randomized approximations of Tukey's depth

Authors: Simon Briend, Gábor Lugosi, Roberto Imbuzeiro Oliveira

Abstract: Tukey's depth (or halfspace depth) is a widely used measure of centrality for multivariate data. However, exact computation of Tukey's depth is known to be a hard problem in high dimensions. As a remedy, randomized approximations of Tukey's depth have been proposed. In this paper we explore when such randomized algorithms return a good approximation of Tukey's depth. We study the case when the dat… ▽ More Tukey's depth (or halfspace depth) is a widely used measure of centrality for multivariate data. However, exact computation of Tukey's depth is known to be a hard problem in high dimensions. As a remedy, randomized approximations of Tukey's depth have been proposed. In this paper we explore when such randomized algorithms return a good approximation of Tukey's depth. We study the case when the data are sampled from a log-concave isotropic distribution. We prove that, if one requires that the algorithm runs in polynomial time in the dimension, the randomized algorithm correctly approximates the maximal depth $1/2$ and depths close to zero. On the other hand, for any point of intermediate depth, any good approximation requires exponential complexity. △ Less

Submitted 7 July, 2025; v1 submitted 11 September, 2023; originally announced September 2023.

arXiv:2209.02856 [pdf, ps, other]

A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance \& heterogeneous noise

Authors: Roberto I. Oliveira, Zoraida F. Rico, Philip Thompson

Abstract: We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $εn$ arbitrary outliers. We wish to estimate a $p$-dimensional parameter $b^*$ given such sample of a label-feature pair $(y,x)$ satisfying $y=\langle x,b^*\rangle+ξ$ with heavy-tailed $(x,ξ)$. We only assume $x$ is $L^4-L^2$ hypercontractive with constant $L>0$… ▽ More We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $εn$ arbitrary outliers. We wish to estimate a $p$-dimensional parameter $b^*$ given such sample of a label-feature pair $(y,x)$ satisfying $y=\langle x,b^*\rangle+ξ$ with heavy-tailed $(x,ξ)$. We only assume $x$ is $L^4-L^2$ hypercontractive with constant $L>0$ and has covariance matrix $Σ$ with minimum eigenvalue $1/μ^2>0$ and bounded condition number $κ>0$. The noise $ξ$ can be arbitrarily dependent on $x$ and nonsymmetric as long as $ξx$ has finite covariance matrix $Ξ$. We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Σ,Ξ)$ nor the operator norm of $Ξ$. With probability at least $1-δ$, our proposed estimator attains the statistical rate $μ^2\VertΞ\Vert^{1/2}(\frac{p}{n}+\frac{\log(1/δ)}{n}+ε)^{1/2}$ and breakdown-point $ε\lesssim\frac{1}{L^4κ^2}$, both optimal in the $\ell_2$-norm, assuming the near-optimal minimum sample size $L^4κ^2(p\log p + \log(1/δ))\lesssim n$, up to a log factor. To the best of our knowledge, this is the first computationally tractable algorithm satisfying simultaneously all the mentioned properties. Our estimator is based on a two-stage Multiplicative Weight Update algorithm. The first stage estimates a descent direction $\hat v$ with respect to the (unknown) pre-conditioned inner product $\langleΣ(\cdot),\cdot\rangle$. The second stage estimate the descent direction $Σ\hat v$ with respect to the (known) inner product $\langle\cdot,\cdot\rangle$, without knowing nor estimating $Σ$. △ Less

Submitted 6 September, 2022; originally announced September 2022.

Comments: 58 pages

arXiv:2006.04005 [pdf, other]

doi 10.1109/TNNLS.2021.3112897

Entropic Out-of-Distribution Detection: Seamless Detection of Unknown Examples

Authors: David Macêdo, Tsang Ing Ren, Cleber Zanchettin, Adriano L. I. Oliveira, Teresa Ludermir

Abstract: In this paper, we argue that the unsatisfactory out-of-distribution (OOD) detection performance of neural networks is mainly due to the SoftMax loss anisotropy and propensity to produce low entropy probability distributions in disagreement with the principle of maximum entropy. Current out-of-distribution (OOD) detection approaches usually do not directly fix the SoftMax loss drawbacks, but rather… ▽ More In this paper, we argue that the unsatisfactory out-of-distribution (OOD) detection performance of neural networks is mainly due to the SoftMax loss anisotropy and propensity to produce low entropy probability distributions in disagreement with the principle of maximum entropy. Current out-of-distribution (OOD) detection approaches usually do not directly fix the SoftMax loss drawbacks, but rather build techniques to circumvent it. Unfortunately, those methods usually produce undesired side effects (e.g., classification accuracy drop, additional hyperparameters, slower inferences, and collecting extra data). In the opposite direction, we propose replacing SoftMax loss with a novel loss function that does not suffer from the mentioned weaknesses. The proposed IsoMax loss is isotropic (exclusively distance-based) and provides high entropy posterior probability distributions. Replacing the SoftMax loss by IsoMax loss requires no model or training changes. Additionally, the models trained with IsoMax loss produce as fast and energy-efficient inferences as those trained using SoftMax loss. Moreover, no classification accuracy drop is observed. The proposed method does not rely on outlier/background data, hyperparameter tuning, temperature calibration, feature extraction, metric learning, adversarial training, ensemble procedures, or generative models. Our experiments showed that IsoMax loss works as a seamless SoftMax loss drop-in replacement that significantly improves neural networks' OOD detection performance. Hence, it may be used as a baseline OOD detection approach to be combined with current or future OOD detection techniques to achieve even higher results. △ Less

Submitted 4 August, 2021; v1 submitted 6 June, 2020; originally announced June 2020.

Comments: Accepted for publication in the IEEE Transactions on Neural Networks and Learning Systems: Special Issue on Deep Learning for Anomaly Detection

arXiv:1908.05569 [pdf, other]

doi 10.1109/IJCNN52387.2021.9533899

Entropic Out-of-Distribution Detection

Authors: David Macêdo, Tsang Ing Ren, Cleber Zanchettin, Adriano L. I. Oliveira, Teresa Ludermir

Abstract: Out-of-distribution (OOD) detection approaches usually present special requirements (e.g., hyperparameter validation, collection of outlier data) and produce side effects (e.g., classification accuracy drop, slower energy-inefficient inferences). We argue that these issues are a consequence of the SoftMax loss anisotropy and disagreement with the maximum entropy principle. Thus, we propose the Iso… ▽ More Out-of-distribution (OOD) detection approaches usually present special requirements (e.g., hyperparameter validation, collection of outlier data) and produce side effects (e.g., classification accuracy drop, slower energy-inefficient inferences). We argue that these issues are a consequence of the SoftMax loss anisotropy and disagreement with the maximum entropy principle. Thus, we propose the IsoMax loss and the entropic score. The seamless drop-in replacement of the SoftMax loss by IsoMax loss requires neither additional data collection nor hyperparameter validation. The trained models do not exhibit classification accuracy drop and produce fast energy-efficient inferences. Moreover, our experiments show that training neural networks with IsoMax loss significantly improves their OOD detection performance. The IsoMax loss exhibits state-of-the-art performance under the mentioned conditions (fast energy-efficient inference, no classification accuracy drop, no collection of outlier data, and no hyperparameter validation), which we call the seamless OOD detection task. In future work, current OOD detection methods may replace the SoftMax loss with the IsoMax loss to improve their performance on the commonly studied non-seamless OOD detection problem. △ Less

Submitted 24 May, 2021; v1 submitted 15 August, 2019; originally announced August 2019.

Comments: Accepted for publication in The International Joint Conference on Neural Networks (IJCNN), 2021

arXiv:1808.05264 [pdf, other]

DeepDownscale: a Deep Learning Strategy for High-Resolution Weather Forecast

Authors: Eduardo R. Rodrigues, Igor Oliveira, Renato L. F. Cunha, Marco A. S. Netto

Abstract: Running high-resolution physical models is computationally expensive and essential for many disciplines. Agriculture, transportation, and energy are sectors that depend on high-resolution weather models, which typically consume many hours of large High Performance Computing (HPC) systems to deliver timely results. Many users cannot afford to run the desired resolution and are forced to use low res… ▽ More Running high-resolution physical models is computationally expensive and essential for many disciplines. Agriculture, transportation, and energy are sectors that depend on high-resolution weather models, which typically consume many hours of large High Performance Computing (HPC) systems to deliver timely results. Many users cannot afford to run the desired resolution and are forced to use low resolution output. One simple solution is to interpolate results for visualization. It is also possible to combine an ensemble of low resolution models to obtain a better prediction. However, these approaches fail to capture the redundant information and patterns in the low-resolution input that could help improve the quality of prediction. In this paper, we propose and evaluate a strategy based on a deep neural network to learn a high-resolution representation from low-resolution predictions using weather forecast as a practical use case. We take a supervised learning approach, since obtaining labeled data can be done automatically. Our results show significant improvement when compared with standard practices and the strategy is still lightweight enough to run on modest computer systems. △ Less

Submitted 15 August, 2018; originally announced August 2018.

Comments: 8 pages, 6 figures, accepted for publication at 14th IEEE eScience

arXiv:1808.02707 [pdf]

A Method for Estimating the Probability of Extremely Rare Accidents in Complex Systems

Authors: Ítalo Romani de Oliveira, Jeffery Musiak

Abstract: Estimating the probability of failures or accidents with aerospace systems is often necessary when new concepts or designs are introduced, as it is being done for Autonomous Aircraft. If the design is safe, as it is supposed to be, accident cases are hard to find. Such analysis needs some variance reduction technique and several algorithms exist for that, however specific model features may cause… ▽ More Estimating the probability of failures or accidents with aerospace systems is often necessary when new concepts or designs are introduced, as it is being done for Autonomous Aircraft. If the design is safe, as it is supposed to be, accident cases are hard to find. Such analysis needs some variance reduction technique and several algorithms exist for that, however specific model features may cause difficulties in practice, such as the case of system models where independent agents have to autonomously accomplish missions within finite time, and likely with the presence of human agents. For handling these scenarios, this paper presents a novel estimation approach, based on the combination of the well-established variation reduction technique of Interacting Particles System (IPS) with the long-standing optimization algorithm denominated DIviding RECTangles (DIRECT). When combined, these two techniques yield statistically significant results for extremely low probabilities. In addition, this novel approach allows the identification of intermediate events and simplifies the evaluation of sensitivity of the estimated probabilities to certain system parameters. △ Less

Submitted 8 August, 2018; originally announced August 2018.

arXiv:1807.10755 [pdf, other]

A writer-independent approach for offline signature verification using deep convolutional neural networks features

Authors: Victor L. F. Souza, Adriano L. I. Oliveira, Robert Sabourin

Abstract: The use of features extracted using a deep convolutional neural network (CNN) combined with a writer-dependent (WD) SVM classifier resulted in significant improvement in performance of handwritten signature verification (HSV) when compared to the previous state-of-the-art methods. In this work it is investigated whether the use of these CNN features provide good results in a writer-independent (WI… ▽ More The use of features extracted using a deep convolutional neural network (CNN) combined with a writer-dependent (WD) SVM classifier resulted in significant improvement in performance of handwritten signature verification (HSV) when compared to the previous state-of-the-art methods. In this work it is investigated whether the use of these CNN features provide good results in a writer-independent (WI) HSV context, based on the dichotomy transformation combined with the use of an SVM writer-independent classifier. The experiments performed in the Brazilian and GPDS datasets show that (i) the proposed approach outperformed other WI-HSV methods from the literature, (ii) in the global threshold scenario, the proposed approach was able to outperform the writer-dependent method with CNN features in the Brazilian dataset, (iii) in an user threshold scenario, the results are similar to those obtained by the writer-dependent method with CNN features. △ Less

Submitted 26 July, 2018; originally announced July 2018.

arXiv:1806.09244 [pdf, other]

A Scalable Machine Learning System for Pre-Season Agriculture Yield Forecast

Authors: Igor Oliveira, Renato L. F. Cunha, Bruno Silva, Marco A. S. Netto

Abstract: Yield forecast is essential to agriculture stakeholders and can be obtained with the use of machine learning models and data coming from multiple sources. Most solutions for yield forecast rely on NDVI (Normalized Difference Vegetation Index) data, which is time-consuming to be acquired and processed. To bring scalability for yield forecast, in the present paper we describe a system that incorpora… ▽ More Yield forecast is essential to agriculture stakeholders and can be obtained with the use of machine learning models and data coming from multiple sources. Most solutions for yield forecast rely on NDVI (Normalized Difference Vegetation Index) data, which is time-consuming to be acquired and processed. To bring scalability for yield forecast, in the present paper we describe a system that incorporates satellite-derived precipitation and soil properties datasets, seasonal climate forecasting data from physical models and other sources to produce a pre-season prediction of soybean/maize yield---with no need of NDVI data. This system provides significantly useful results by the exempting the need for high-resolution remote-sensing data and allowing farmers to prepare for adverse climate influence on the crop cycle. In our studies, we forecast the soybean and maize yields for Brazil and USA, which corresponded to 44% of the world's grain production in 2016. Results show the error metrics for soybean and maize yield forecasts are comparable to similar systems that only provide yield forecast information in the first weeks to months of the crop cycle. △ Less

Submitted 15 October, 2018; v1 submitted 24 June, 2018; originally announced June 2018.

Comments: 8 pages, 5 figures, Submitted to 14th IEEE eScience

arXiv:1703.08729 [pdf, other]

Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality

Authors: Song Mei, Theodor Misiakiewicz, Andrea Montanari, Roberto I. Oliveira

Abstract: A number of statistical estimation problems can be addressed by semidefinite programs (SDP). While SDPs are solvable in polynomial time using interior point methods, in practice generic SDP solvers do not scale well to high-dimensional problems. In order to cope with this problem, Burer and Monteiro proposed a non-convex rank-constrained formulation, which has good performance in practice but is s… ▽ More A number of statistical estimation problems can be addressed by semidefinite programs (SDP). While SDPs are solvable in polynomial time using interior point methods, in practice generic SDP solvers do not scale well to high-dimensional problems. In order to cope with this problem, Burer and Monteiro proposed a non-convex rank-constrained formulation, which has good performance in practice but is still poorly understood theoretically. In this paper we study the rank-constrained version of SDPs arising in MaxCut and in synchronization problems. We establish a Grothendieck-type inequality that proves that all the local maxima and dangerous saddle points are within a small multiplicative gap from the global maximum. We use this structural information to prove that SDPs can be solved within a known accuracy, by applying the Riemannian trust-region method to this non-convex problem, while constraining the rank to be of order one. For the MaxCut problem, our inequality implies that any local maximizer of the rank-constrained SDP provides a $ (1 - 1/(k-1)) \times 0.878$ approximation of the MaxCut, when the rank is fixed to $k$. We then apply our results to data matrices generated according to the Gaussian ${\mathbb Z}_2$ synchronization problem, and the two-groups stochastic block model with large bounded degree. We prove that the error achieved by local maximizers undergoes a phase transition at the same threshold as for information-theoretically optimal methods. △ Less

Submitted 29 March, 2017; v1 submitted 25 March, 2017; originally announced March 2017.

Comments: 38 pages; 9 pdf figures

arXiv:1107.0312 [pdf, ps, other]

Approximate group context tree

Authors: Alexandre Belloni, Roberto I. Oliveira

Abstract: We study a variable length Markov chain model associated with a group of stationary processes that share the same context tree but each process has potentially different conditional probabilities. We propose a new model selection and estimation method which is computationally efficient. We develop oracle and adaptivity inequalities, as well as model selection properties, that hold under continuity… ▽ More We study a variable length Markov chain model associated with a group of stationary processes that share the same context tree but each process has potentially different conditional probabilities. We propose a new model selection and estimation method which is computationally efficient. We develop oracle and adaptivity inequalities, as well as model selection properties, that hold under continuity of the transition probabilities and polynomial $β$-mixing. In particular, model misspecification is allowed. These results are applied to interesting families of processes. For Markov processes, we obtain uniform rate of convergence for the estimation error of transition probabilities as well as perfect model selection results. For chains of infinite order with complete connections, we obtain explicit uniform rates of convergence on the estimation of conditional probabilities, which have an explicit dependence on the processes' continuity rates. Similar guarantees are also derived for renewal processes. Our results are shown to be applicable to discrete stochastic dynamic programming problems and to dynamic discrete choice models. We also apply our estimator to a linguistic study, based on recent work, by Galves et al (2012), of the rhythmic differences between Brazilian and European Portuguese. △ Less

Submitted 30 December, 2015; v1 submitted 1 July, 2011; originally announced July 2011.

Showing 1–10 of 10 results for author: Oliveira, I