-
HypUC: Hyperfine Uncertainty Calibration with Gradient-boosted Corrections for Reliable Regression on Imbalanced Electrocardiograms
Authors:
Uddeshya Upadhyay,
Sairam Bade,
Arjun Puranik,
Shahir Asfahan,
Melwin Babu,
Francisco Lopez-Jimenez,
Samuel J. Asirvatham,
Ashim Prasad,
Ajit Rajasekharan,
Samir Awasthi,
Rakesh Barve
Abstract:
The automated analysis of medical time series, such as the electrocardiogram (ECG), electroencephalogram (EEG), pulse oximetry, etc, has the potential to serve as a valuable tool for diagnostic decisions, allowing for remote monitoring of patients and more efficient use of expensive and time-consuming medical procedures. Deep neural networks (DNNs) have been demonstrated to process such signals ef…
▽ More
The automated analysis of medical time series, such as the electrocardiogram (ECG), electroencephalogram (EEG), pulse oximetry, etc, has the potential to serve as a valuable tool for diagnostic decisions, allowing for remote monitoring of patients and more efficient use of expensive and time-consuming medical procedures. Deep neural networks (DNNs) have been demonstrated to process such signals effectively. However, previous research has primarily focused on classifying medical time series rather than attempting to regress the continuous-valued physiological parameters central to diagnosis. One significant challenge in this regard is the imbalanced nature of the dataset, as a low prevalence of abnormal conditions can lead to heavily skewed data that results in inaccurate predictions and a lack of certainty in such predictions when deployed. To address these challenges, we propose HypUC, a framework for imbalanced probabilistic regression in medical time series, making several contributions. (i) We introduce a simple kernel density-based technique to tackle the imbalanced regression problem with medical time series. (ii) Moreover, we employ a probabilistic regression framework that allows uncertainty estimation for the predicted continuous values. (iii) We also present a new approach to calibrate the predicted uncertainty further. (iv) Finally, we demonstrate a technique to use calibrated uncertainty estimates to improve the predicted continuous value and show the efficacy of the calibrated uncertainty estimates to flag unreliable predictions. HypUC is evaluated on a large, diverse, real-world dataset of ECGs collected from millions of patients, outperforming several conventional baselines on various diagnostic tasks, suggesting a potential use-case for the reliable clinical deployment of deep learning models.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Dependence model assessment and selection with DecoupleNets
Authors:
Marius Hofert,
Avinash Prasad,
Mu Zhu
Abstract:
Neural networks are suggested for learning a map from $d$-dimensional samples with any underlying dependence structure to multivariate uniformity in $d'$ dimensions. This map, termed DecoupleNet, is used for dependence model assessment and selection. If the data-generating dependence model was known, and if it was among the few analytically tractable ones, one such transformation for $d'=d$ is Ros…
▽ More
Neural networks are suggested for learning a map from $d$-dimensional samples with any underlying dependence structure to multivariate uniformity in $d'$ dimensions. This map, termed DecoupleNet, is used for dependence model assessment and selection. If the data-generating dependence model was known, and if it was among the few analytically tractable ones, one such transformation for $d'=d$ is Rosenblatt's transform. DecoupleNets have multiple advantages. For example, they only require an available sample and are applicable to $d'<d$, in particular $d'=2$. This allows for simpler model assessment and selection, both numerically and, because $d'=2$, especially graphically. A graphical assessment method has the advantage of being able to identify why, or in which region of the domain, a candidate model does not provide an adequate fit, thus leading to model selection in particular regions of interest or improved model building strategies in such regions. Through simulation studies with data from various copulas, the feasibility and validity of this novel DecoupleNet approach is demonstrated. Applications to real world data illustrate its usefulness for model assessment and selection.
△ Less
Submitted 5 October, 2022; v1 submitted 7 February, 2022;
originally announced February 2022.
-
RafterNet: Probabilistic predictions in multi-response regression
Authors:
Marius Hofert,
Avinash Prasad,
Mu Zhu
Abstract:
A fully nonparametric approach for making probabilistic predictions in multi-response regression problems is introduced. Random forests are used as marginal models for each response variable and, as novel contribution of the present work, the dependence between the multiple response variables is modeled by a generative neural network. This combined modeling approach of random forests, correspondin…
▽ More
A fully nonparametric approach for making probabilistic predictions in multi-response regression problems is introduced. Random forests are used as marginal models for each response variable and, as novel contribution of the present work, the dependence between the multiple response variables is modeled by a generative neural network. This combined modeling approach of random forests, corresponding empirical marginal residual distributions and a generative neural network is referred to as RafterNet. Multiple datasets serve as examples to demonstrate the flexibility of the approach and its impact for making probabilistic forecasts.
△ Less
Submitted 11 October, 2022; v1 submitted 1 December, 2021;
originally announced December 2021.
-
Differentiable Spline Approximations
Authors:
Minsu Cho,
Aditya Balu,
Ameya Joshi,
Anjana Deva Prasad,
Biswajit Khara,
Soumik Sarkar,
Baskar Ganapathysubramanian,
Adarsh Krishnamurthy,
Chinmay Hegde
Abstract:
The paradigm of differentiable programming has significantly enhanced the scope of machine learning via the judicious use of gradient-based optimization. However, standard differentiable programming methods (such as autodiff) typically require that the machine learning models be differentiable, limiting their applicability. Our goal in this paper is to use a new, principled approach to extend grad…
▽ More
The paradigm of differentiable programming has significantly enhanced the scope of machine learning via the judicious use of gradient-based optimization. However, standard differentiable programming methods (such as autodiff) typically require that the machine learning models be differentiable, limiting their applicability. Our goal in this paper is to use a new, principled approach to extend gradient-based optimization to functions well modeled by splines, which encompass a large family of piecewise polynomial models. We derive the form of the (weak) Jacobian of such functions and show that it exhibits a block-sparse structure that can be computed implicitly and efficiently. Overall, we show that leveraging this redesigned Jacobian in the form of a differentiable "layer" in predictive models leads to improved performance in diverse applications such as image segmentation, 3D point cloud reconstruction, and finite element analysis.
△ Less
Submitted 4 October, 2021;
originally announced October 2021.
-
Heavy-tailed Streaming Statistical Estimation
Authors:
Che-Ping Tsai,
Adarsh Prasad,
Sivaraman Balakrishnan,
Pradeep Ravikumar
Abstract:
We consider the task of heavy-tailed statistical estimation given streaming $p$-dimensional samples. This could also be viewed as stochastic optimization under heavy-tailed distributions, with an additional $O(p)$ space complexity constraint. We design a clipped stochastic gradient descent algorithm and provide an improved analysis, under a more nuanced condition on the noise of the stochastic gra…
▽ More
We consider the task of heavy-tailed statistical estimation given streaming $p$-dimensional samples. This could also be viewed as stochastic optimization under heavy-tailed distributions, with an additional $O(p)$ space complexity constraint. We design a clipped stochastic gradient descent algorithm and provide an improved analysis, under a more nuanced condition on the noise of the stochastic gradients, which we show is critical when analyzing stochastic optimization problems arising from general statistical estimation problems. Our results guarantee convergence not just in expectation but with exponential concentration, and moreover does so using $O(1)$ batch size. We provide consequences of our results for mean estimation and linear regression. Finally, we provide empirical corroboration of our results and algorithms via synthetic experiments for mean estimation and linear regression.
△ Less
Submitted 25 February, 2022; v1 submitted 25 August, 2021;
originally announced August 2021.
-
On Proximal Policy Optimization's Heavy-tailed Gradients
Authors:
Saurabh Garg,
Joshua Zhanson,
Emilio Parisotto,
Adarsh Prasad,
J. Zico Kolter,
Zachary C. Lipton,
Sivaraman Balakrishnan,
Ruslan Salakhutdinov,
Pradeep Ravikumar
Abstract:
Modern policy gradient algorithms such as Proximal Policy Optimization (PPO) rely on an arsenal of heuristics, including loss clipping and gradient clipping, to ensure successful learning. These heuristics are reminiscent of techniques from robust statistics, commonly used for estimation in outlier-rich (``heavy-tailed'') regimes. In this paper, we present a detailed empirical study to characteriz…
▽ More
Modern policy gradient algorithms such as Proximal Policy Optimization (PPO) rely on an arsenal of heuristics, including loss clipping and gradient clipping, to ensure successful learning. These heuristics are reminiscent of techniques from robust statistics, commonly used for estimation in outlier-rich (``heavy-tailed'') regimes. In this paper, we present a detailed empirical study to characterize the heavy-tailed nature of the gradients of the PPO surrogate reward function. We demonstrate that the gradients, especially for the actor network, exhibit pronounced heavy-tailedness and that it increases as the agent's policy diverges from the behavioral policy (i.e., as the agent goes further off policy). Further examination implicates the likelihood ratios and advantages in the surrogate reward as the main sources of the observed heavy-tailedness. We then highlight issues arising due to the heavy-tailed nature of the gradients. In this light, we study the effects of the standard PPO clipping heuristics, demonstrating that these tricks primarily serve to offset heavy-tailedness in gradients. Thus motivated, we propose incorporating GMOM, a high-dimensional robust estimator, into PPO as a substitute for three clipping tricks. Despite requiring less hyperparameter tuning, our method matches the performance of PPO (with all heuristics enabled) on a battery of MuJoCo continuous control tasks.
△ Less
Submitted 12 July, 2021; v1 submitted 20 February, 2021;
originally announced February 2021.
-
Applications of multivariate quasi-random sampling with neural networks
Authors:
Marius Hofert,
Avinash Prasad,
Mu Zhu
Abstract:
Generative moment matching networks (GMMNs) are suggested for modeling the cross-sectional dependence between stochastic processes. The stochastic processes considered are geometric Brownian motions and ARMA-GARCH models. Geometric Brownian motions lead to an application of pricing American basket call options under dependence and ARMA-GARCH models lead to an application of simulating predictive d…
▽ More
Generative moment matching networks (GMMNs) are suggested for modeling the cross-sectional dependence between stochastic processes. The stochastic processes considered are geometric Brownian motions and ARMA-GARCH models. Geometric Brownian motions lead to an application of pricing American basket call options under dependence and ARMA-GARCH models lead to an application of simulating predictive distributions. In both types of applications the benefit of using GMMNs in comparison to parametric dependence models is highlighted and the fact that GMMNs can produce dependent quasi-random samples with no additional effort is exploited to obtain variance reduction.
△ Less
Submitted 27 August, 2021; v1 submitted 14 December, 2020;
originally announced December 2020.
-
Robust Linear Regression: Optimal Rates in Polynomial Time
Authors:
Ainesh Bakshi,
Adarsh Prasad
Abstract:
We obtain robust and computationally efficient estimators for learning several linear models that achieve statistically optimal convergence rate under minimal distributional assumptions. Concretely, we assume our data is drawn from a $k$-hypercontractive distribution and an $ε$-fraction is adversarially corrupted. We then describe an estimator that converges to the optimal least-squares minimizer…
▽ More
We obtain robust and computationally efficient estimators for learning several linear models that achieve statistically optimal convergence rate under minimal distributional assumptions. Concretely, we assume our data is drawn from a $k$-hypercontractive distribution and an $ε$-fraction is adversarially corrupted. We then describe an estimator that converges to the optimal least-squares minimizer for the true distribution at a rate proportional to $ε^{2-2/k}$, when the noise is independent of the covariates. We note that no such estimator was known prior to our work, even with access to unbounded computation. The rate we achieve is information-theoretically optimal and thus we resolve the main open question in Klivans, Kothari and Meka [COLT'18].
Our key insight is to identify an analytic condition that serves as a polynomial relaxation of independence of random variables. In particular, we show that when the moments of the noise and covariates are negatively-correlated, we obtain the same rate as independent noise. Further, when the condition is not satisfied, we obtain a rate proportional to $ε^{2-4/k}$, and again match the information-theoretic lower bound. Our central technical contribution is to algorithmically exploit independence of random variables in the "sum-of-squares" framework by formulating it as the aforementioned polynomial inequality.
△ Less
Submitted 4 December, 2020; v1 submitted 29 June, 2020;
originally announced July 2020.
-
Learning Minimax Estimators via Online Learning
Authors:
Kartik Gupta,
Arun Sai Suggala,
Adarsh Prasad,
Praneeth Netrapalli,
Pradeep Ravikumar
Abstract:
We consider the problem of designing minimax estimators for estimating the parameters of a probability distribution. Unlike classical approaches such as the MLE and minimum distance estimators, we consider an algorithmic approach for constructing such estimators. We view the problem of designing minimax estimators as finding a mixed strategy Nash equilibrium of a zero-sum game. By leveraging recen…
▽ More
We consider the problem of designing minimax estimators for estimating the parameters of a probability distribution. Unlike classical approaches such as the MLE and minimum distance estimators, we consider an algorithmic approach for constructing such estimators. We view the problem of designing minimax estimators as finding a mixed strategy Nash equilibrium of a zero-sum game. By leveraging recent results in online learning with non-convex losses, we provide a general algorithm for finding a mixed-strategy Nash equilibrium of general non-convex non-concave zero-sum games. Our algorithm requires access to two subroutines: (a) one which outputs a Bayes estimator corresponding to a given prior probability distribution, and (b) one which computes the worst-case risk of any given estimator. Given access to these two subroutines, we show that our algorithm outputs both a minimax estimator and a least favorable prior. To demonstrate the power of this approach, we use it to construct provably minimax estimators for classical problems such as estimation in the finite Gaussian sequence model, and linear regression.
△ Less
Submitted 19 June, 2020;
originally announced June 2020.
-
Multivariate time-series modeling with generative neural networks
Authors:
Marius Hofert,
Avinash Prasad,
Mu Zhu
Abstract:
Generative moment matching networks (GMMNs) are introduced as dependence models for the joint innovation distribution of multivariate time series (MTS). Following the popular copula-GARCH approach for modeling dependent MTS data, a framework based on a GMMN-GARCH approach is presented. First, ARMA-GARCH models are utilized to capture the serial dependence within each univariate marginal time serie…
▽ More
Generative moment matching networks (GMMNs) are introduced as dependence models for the joint innovation distribution of multivariate time series (MTS). Following the popular copula-GARCH approach for modeling dependent MTS data, a framework based on a GMMN-GARCH approach is presented. First, ARMA-GARCH models are utilized to capture the serial dependence within each univariate marginal time series. Second, if the number of marginal time series is large, principal component analysis (PCA) is used as a dimension-reduction step. Last, the remaining cross-sectional dependence is modeled via a GMMN, the main contribution of this work. GMMNs are highly flexible and easy to simulate from, which is a major advantage over the copula-GARCH approach. Applications involving yield curve modeling and the analysis of foreign exchange-rate returns demonstrate the utility of the GMMN-GARCH approach, especially in terms of producing better empirical predictive distributions and making better probabilistic forecasts.
△ Less
Submitted 1 October, 2021; v1 submitted 24 February, 2020;
originally announced February 2020.
-
Data-Centric Mixed-Variable Bayesian Optimization For Materials Design
Authors:
Akshay Iyer,
Yichi Zhang,
Aditya Prasad,
Siyu Tao,
Yixing Wang,
Linda Schadler,
L Catherine Brinson,
Wei Chen
Abstract:
Materials design can be cast as an optimization problem with the goal of achieving desired properties, by varying material composition, microstructure morphology, and processing conditions. Existence of both qualitative and quantitative material design variables leads to disjointed regions in property space, making the search for optimal design challenging. Limited availability of experimental dat…
▽ More
Materials design can be cast as an optimization problem with the goal of achieving desired properties, by varying material composition, microstructure morphology, and processing conditions. Existence of both qualitative and quantitative material design variables leads to disjointed regions in property space, making the search for optimal design challenging. Limited availability of experimental data and the high cost of simulations magnify the challenge. This situation calls for design methodologies that can extract useful information from existing data and guide the search for optimal designs efficiently. To this end, we present a data-centric, mixed-variable Bayesian Optimization framework that integrates data from literature, experiments, and simulations for knowledge discovery and computational materials design. Our framework pivots around the Latent Variable Gaussian Process (LVGP), a novel Gaussian Process technique which projects qualitative variables on a continuous latent space for covariance formulation, as the surrogate model to quantify "lack of data" uncertainty. Expected improvement, an acquisition criterion that balances exploration and exploitation, helps navigate a complex, nonlinear design space to locate the optimum design. The proposed framework is tested through a case study which seeks to concurrently identify the optimal composition and morphology for insulating polymer nanocomposites. We also present an extension of mixed-variable Bayesian Optimization for multiple objectives to identify the Pareto Frontier within tens of iterations. These findings project Bayesian Optimization as a powerful tool for design of engineered material systems.
△ Less
Submitted 4 July, 2019;
originally announced July 2019.
-
A Unified Approach to Robust Mean Estimation
Authors:
Adarsh Prasad,
Sivaraman Balakrishnan,
Pradeep Ravikumar
Abstract:
In this paper, we develop connections between two seemingly disparate, but central, models in robust statistics: Huber's epsilon-contamination model and the heavy-tailed noise model. We provide conditions under which this connection provides near-statistically-optimal estimators. Building on this connection, we provide a simple variant of recent computationally-efficient algorithms for mean estima…
▽ More
In this paper, we develop connections between two seemingly disparate, but central, models in robust statistics: Huber's epsilon-contamination model and the heavy-tailed noise model. We provide conditions under which this connection provides near-statistically-optimal estimators. Building on this connection, we provide a simple variant of recent computationally-efficient algorithms for mean estimation in Huber's model, which given our connection entails that the same efficient sample-pruning based estimators is simultaneously robust to heavy-tailed noise and Huber contamination. Furthermore, we complement our efficient algorithms with statistically-optimal albeit computationally intractable estimators, which are simultaneously optimally robust in both models. We study the empirical performance of our proposed estimators on synthetic datasets, and find that our methods convincingly outperform a variety of practical baselines.
△ Less
Submitted 1 July, 2019;
originally announced July 2019.
-
Quasi-random sampling for multivariate distributions via generative neural networks
Authors:
Marius Hofert,
Avinash Prasad,
Mu Zhu
Abstract:
Generative moment matching networks (GMMNs) are introduced for generating quasi-random samples from multivariate models with any underlying copula in order to compute estimates under variance reduction. So far, quasi-random sampling for multivariate distributions required a careful design, exploiting specific properties (such as conditional distributions) of the implied parametric copula or the un…
▽ More
Generative moment matching networks (GMMNs) are introduced for generating quasi-random samples from multivariate models with any underlying copula in order to compute estimates under variance reduction. So far, quasi-random sampling for multivariate distributions required a careful design, exploiting specific properties (such as conditional distributions) of the implied parametric copula or the underlying quasi-Monte Carlo (QMC) point set, and was only tractable for a small number of models. Utilizing GMMNs allows one to construct quasi-random samples for a much larger variety of multivariate distributions without such restrictions, including empirical ones from real data with dependence structures not well captured by parametric copulas. Once trained on pseudo-random samples from a parametric model or on real data, these neural networks only require a multivariate standard uniform randomized QMC point set as input and are thus fast in estimating expectations of interest under dependence with variance reduction. Numerical examples are considered to demonstrate the approach, including applications inspired by risk management practice. All results are reproducible with the demos GMMN_QMC_paper, GMMN_QMC_data and GMMN_QMC_timings as part of the R package gnn.
△ Less
Submitted 2 April, 2020; v1 submitted 1 November, 2018;
originally announced November 2018.
-
Multi-agent Deep Reinforcement Learning for Zero Energy Communities
Authors:
Amit Prasad,
Ivana Dusparic
Abstract:
Advances in renewable energy generation and introduction of the government targets to improve energy efficiency gave rise to a concept of a Zero Energy Building (ZEB). A ZEB is a building whose net energy usage over a year is zero, i.e., its energy use is not larger than its overall renewables generation. A collection of ZEBs forms a Zero Energy Community (ZEC). This paper addresses the problem of…
▽ More
Advances in renewable energy generation and introduction of the government targets to improve energy efficiency gave rise to a concept of a Zero Energy Building (ZEB). A ZEB is a building whose net energy usage over a year is zero, i.e., its energy use is not larger than its overall renewables generation. A collection of ZEBs forms a Zero Energy Community (ZEC). This paper addresses the problem of energy sharing in such a community. This is different from previously addressed energy sharing between buildings as our focus is on the improvement of community energy status, while traditionally research focused on reducing losses due to transmission and storage, or achieving economic gains. We model this problem in a multi-agent environment and propose a Deep Reinforcement Learning (DRL) based solution. Each building is represented by an intelligent agent that learns over time the appropriate behaviour to share energy. We have evaluated the proposed solution in a multi-agent simulation built using osBrain. Results indicate that with time agents learn to collaborate and learn a policy comparable to the optimal policy, which in turn improves the ZEC's energy status. Buildings with no renewables preferred to request energy from their neighbours rather than from the supply grid.
△ Less
Submitted 27 June, 2019; v1 submitted 8 October, 2018;
originally announced October 2018.
-
Revisiting Adversarial Risk
Authors:
Arun Sai Suggala,
Adarsh Prasad,
Vaishnavh Nagarajan,
Pradeep Ravikumar
Abstract:
Recent works on adversarial perturbations show that there is an inherent trade-off between standard test accuracy and adversarial accuracy. Specifically, they show that no classifier can simultaneously be robust to adversarial perturbations and achieve high standard test accuracy. However, this is contrary to the standard notion that on tasks such as image classification, humans are robust classif…
▽ More
Recent works on adversarial perturbations show that there is an inherent trade-off between standard test accuracy and adversarial accuracy. Specifically, they show that no classifier can simultaneously be robust to adversarial perturbations and achieve high standard test accuracy. However, this is contrary to the standard notion that on tasks such as image classification, humans are robust classifiers with low error rate. In this work, we show that the main reason behind this confusion is the inexact definition of adversarial perturbation that is used in the literature. To fix this issue, we propose a slight, yet important modification to the existing definition of adversarial perturbation. Based on the modified definition, we show that there is no trade-off between adversarial and standard accuracies; there exist classifiers that are robust and achieve high standard accuracy. We further study several properties of this new definition of adversarial risk and its relation to the existing definition.
△ Less
Submitted 22 March, 2019; v1 submitted 7 June, 2018;
originally announced June 2018.
-
Robust Estimation via Robust Gradient Estimation
Authors:
Adarsh Prasad,
Arun Sai Suggala,
Sivaraman Balakrishnan,
Pradeep Ravikumar
Abstract:
We provide a new computationally-efficient class of estimators for risk minimization. We show that these estimators are robust for general statistical models: in the classical Huber epsilon-contamination model and in heavy-tailed settings. Our workhorse is a novel robust variant of gradient descent, and we provide conditions under which our gradient descent variant provides accurate estimators in…
▽ More
We provide a new computationally-efficient class of estimators for risk minimization. We show that these estimators are robust for general statistical models: in the classical Huber epsilon-contamination model and in heavy-tailed settings. Our workhorse is a novel robust variant of gradient descent, and we provide conditions under which our gradient descent variant provides accurate estimators in a general convex risk minimization problem. We provide specific consequences of our theory for linear regression, logistic regression and for estimation of the canonical parameters in an exponential family. These results provide some of the first computationally tractable and provably robust estimators for these canonical statistical models. Finally, we study the empirical performance of our proposed methods on synthetic and real datasets, and find that our methods convincingly outperform a variety of baselines.
△ Less
Submitted 20 April, 2018; v1 submitted 18 February, 2018;
originally announced February 2018.
-
A framework for measuring dependence between random vectors
Authors:
Marius Hofert,
Wayne Oldford,
Avinash Prasad,
Mu Zhu
Abstract:
A framework for quantifying dependence between random vectors is introduced. With the notion of a collapsing function, random vectors are summarized by single random variables, called collapsed random variables in the framework. Using this framework, a general graphical assessment of independence between groups of random variables for arbitrary collapsing functions is provided. Measures of associa…
▽ More
A framework for quantifying dependence between random vectors is introduced. With the notion of a collapsing function, random vectors are summarized by single random variables, called collapsed random variables in the framework. Using this framework, a general graphical assessment of independence between groups of random variables for arbitrary collapsing functions is provided. Measures of association computed from the collapsed random variables are then used to measure the dependence between random vectors. To this end, suitable collapsing functions are presented. Furthermore, the notion of a collapsed distribution function and collapsed copula are introduced and investigated for certain collapsing functions. This investigation yields a multivariate extension of the Kendall distribution and its corresponding Kendall copula for which some properties and examples are provided. In addition, non-parametric estimators for the collapsed measures of dependence are provided along with their corresponding asymptotic properties. Finally, data applications to bioinformatics and finance are presented.
△ Less
Submitted 10 January, 2018;
originally announced January 2018.
-
Submodular meets Structured: Finding Diverse Subsets in Exponentially-Large Structured Item Sets
Authors:
Adarsh Prasad,
Stefanie Jegelka,
Dhruv Batra
Abstract:
To cope with the high level of ambiguity faced in domains such as Computer Vision or Natural Language processing, robust prediction methods often search for a diverse set of high-quality candidate solutions or proposals. In structured prediction problems, this becomes a daunting task, as the solution space (image labelings, sentence parses, etc.) is exponentially large. We study greedy algorithms…
▽ More
To cope with the high level of ambiguity faced in domains such as Computer Vision or Natural Language processing, robust prediction methods often search for a diverse set of high-quality candidate solutions or proposals. In structured prediction problems, this becomes a daunting task, as the solution space (image labelings, sentence parses, etc.) is exponentially large. We study greedy algorithms for finding a diverse subset of solutions in structured-output spaces by drawing new connections between submodular functions over combinatorial item sets and High-Order Potentials (HOPs) studied for graphical models. Specifically, we show via examples that when marginal gains of submodular diversity functions allow structured representations, this enables efficient (sub-linear time) approximate maximization by reducing the greedy augmentation step to inference in a factor graph with appropriately constructed HOPs. We discuss benefits, tradeoffs, and show that our constructions lead to significantly better proposals.
△ Less
Submitted 6 November, 2014;
originally announced November 2014.