Search | arXiv e-print repository

Active, anytime-valid risk controlling prediction sets

Authors: Ziyu Xu, Nikos Karampatziakis, Paul Mineiro

Abstract: Rigorously establishing the safety of black-box machine learning models concerning critical risk measures is important for providing guarantees about model behavior. Recently, Bates et. al. (JACM '24) introduced the notion of a risk controlling prediction set (RCPS) for producing prediction sets that are statistically guaranteed low risk from machine learning models. Our method extends this notion… ▽ More Rigorously establishing the safety of black-box machine learning models concerning critical risk measures is important for providing guarantees about model behavior. Recently, Bates et. al. (JACM '24) introduced the notion of a risk controlling prediction set (RCPS) for producing prediction sets that are statistically guaranteed low risk from machine learning models. Our method extends this notion to the sequential setting, where we provide guarantees even when the data is collected adaptively, and ensures that the risk guarantee is anytime-valid, i.e., simultaneously holds at all time steps. Further, we propose a framework for constructing RCPSes for active labeling, i.e., allowing one to use a labeling policy that chooses whether to query the true label for each received data point and ensures that the expected proportion of data points whose labels are queried are below a predetermined label budget. We also describe how to use predictors (i.e., the machine learning model for which we provide risk control guarantees) to further improve the utility of our RCPSes by estimating the expected risk conditioned on the covariates. We characterize the optimal choices of label policy and predictor under a fixed label budget and show a regret result that relates the estimation error of the optimal labeling policy and predictor to the wealth process that underlies our RCPSes. Lastly, we present practical ways of formulating label policies and empirically show that our label policies use fewer labels to reach higher utility than naive baseline labeling strategies on both simulations and real data. △ Less

Submitted 31 October, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: 22 pages, 3 figures. Accepted to NeurIPS 2024

arXiv:2405.20677 [pdf, other]

Provably Efficient Interactive-Grounded Learning with Personalized Reward

Authors: Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro

Abstract: Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions. To deal with personalized rewards that are ubiquitous in applications such as recommendation systems, Maghakian et al. [2022] study a version of IGL with contex… ▽ More Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions. To deal with personalized rewards that are ubiquitous in applications such as recommendation systems, Maghakian et al. [2022] study a version of IGL with context-dependent feedback, but their algorithm does not come with theoretical guarantees. In this work, we consider the same problem and provide the first provably efficient algorithms with sublinear regret under realizability. Our analysis reveals that the step-function estimator of prior work can deviate uncontrollably due to finite-sample effects. Our solution is a novel Lipschitz reward estimator which underestimates the true reward and enjoys favorable generalization performances. Building on this estimator, we propose two algorithms, one based on explore-then-exploit and the other based on inverse-gap weighting. We apply IGL to learning from image feedback and learning from text feedback, which are reward-free settings that arise in practice. Experimental results showcase the importance of using our Lipschitz reward estimator and the overall effectiveness of our algorithms. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2302.14248 [pdf, other]

Time-uniform confidence bands for the CDF under nonstationarity

Authors: Paul Mineiro, Steven R. Howard

Abstract: Estimation of the complete distribution of a random variable is a useful primitive for both manual and automated decision making. This problem has received extensive attention in the i.i.d. setting, but the arbitrary data dependent setting remains largely unaddressed. Consistent with known impossibility results, we present computationally felicitous time-uniform and value-uniform bounds on the CDF… ▽ More Estimation of the complete distribution of a random variable is a useful primitive for both manual and automated decision making. This problem has received extensive attention in the i.i.d. setting, but the arbitrary data dependent setting remains largely unaddressed. Consistent with known impossibility results, we present computationally felicitous time-uniform and value-uniform bounds on the CDF of the running averaged conditional distribution of a real-valued random variable which are always valid and sometimes trivial, along with an instance-dependent convergence guarantee. The importance-weighted extension is appropriate for estimating complete counterfactual distributions of rewards given controlled experimentation data exhaust, e.g., from an A/B test or a contextual bandit. △ Less

Submitted 27 February, 2023; originally announced February 2023.

arXiv:2210.13573 [pdf, other]

Conditionally Risk-Averse Contextual Bandits

Authors: Mónika Farsang, Paul Mineiro, Wangda Zhang

Abstract: Contextual bandits with average-case statistical guarantees are inadequate in risk-averse situations because they might trade off degraded worst-case behaviour for better average performance. Designing a risk-averse contextual bandit is challenging because exploration is necessary but risk-aversion is sensitive to the entire distribution of rewards; nonetheless we exhibit the first risk-averse con… ▽ More Contextual bandits with average-case statistical guarantees are inadequate in risk-averse situations because they might trade off degraded worst-case behaviour for better average performance. Designing a risk-averse contextual bandit is challenging because exploration is necessary but risk-aversion is sensitive to the entire distribution of rewards; nonetheless we exhibit the first risk-averse contextual bandit algorithm with an online regret guarantee. We conduct experiments from diverse scenarios where worst-case outcomes should be avoided, from dynamic pricing, inventory management, and self-tuning software; including a production exascale data processing system. △ Less

Submitted 8 July, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

arXiv:2210.11133 [pdf, other]

A lower confidence sequence for the changing mean of non-negative right heavy-tailed observations with bounded mean

Authors: Paul Mineiro

Abstract: A confidence sequence (CS) is an anytime-valid sequential inference primitive which produces an adapted sequence of sets for a predictable parameter sequence with a time-uniform coverage guarantee. This work constructs a non-parametric non-asymptotic lower CS for the running average conditional expectation whose slack converges to zero given non-negative right heavy-tailed observations with bounde… ▽ More A confidence sequence (CS) is an anytime-valid sequential inference primitive which produces an adapted sequence of sets for a predictable parameter sequence with a time-uniform coverage guarantee. This work constructs a non-parametric non-asymptotic lower CS for the running average conditional expectation whose slack converges to zero given non-negative right heavy-tailed observations with bounded mean. Specifically, when the variance is finite the approach dominates the empirical Bernstein supermartingale of Howard et. al.; with infinite variance, can adapt to a known or unknown $(1 + δ)$-th moment bound; and can be efficiently approximated using a sublinear number of sufficient statistics. In certain cases this lower CS can be converted into a closed-interval CS whose width converges to zero, e.g., any bounded realization, or post contextual-bandit inference with bounded rewards and unbounded importance weights. A reference implementation and example simulations demonstrate the technique. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: Reference implementation at https://github.com/microsoft/csrobust

arXiv:2210.10768 [pdf, other]

Anytime-valid off-policy inference for contextual bandits

Authors: Ian Waudby-Smith, Lili Wu, Aaditya Ramdas, Nikos Karampatziakis, Paul Mineiro

Abstract: Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in healthcare and the tech industry. They involve online learning algorithms that adaptively learn policies over time to map observed contexts $X_t$ to actions $A_t$ in an attempt to maximize stochastic rewards $R_t$. This adaptivity raises interesting but hard statistical inference questions, especially counte… ▽ More Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in healthcare and the tech industry. They involve online learning algorithms that adaptively learn policies over time to map observed contexts $X_t$ to actions $A_t$ in an attempt to maximize stochastic rewards $R_t$. This adaptivity raises interesting but hard statistical inference questions, especially counterfactual ones: for example, it is often of interest to estimate the properties of a hypothetical policy that is different from the logging policy that was used to collect the data -- a problem known as ``off-policy evaluation'' (OPE). Using modern martingale techniques, we present a comprehensive framework for OPE inference that relax unnecessary conditions made in some past works, significantly improving on them both theoretically and empirically. Importantly, our methods can be employed while the original experiment is still running (that is, not necessarily post-hoc), when the logging policy may be itself changing (due to learning), and even if the context distributions are a highly dependent time-series (such as if they are drifting over time). More concretely, we derive confidence sequences for various functionals of interest in OPE. These include doubly robust ones for time-varying off-policy mean reward values, but also confidence bands for the entire cumulative distribution function of the off-policy reward distribution. All of our methods (a) are valid at arbitrary stopping times (b) only make nonparametric assumptions, (c) do not require importance weights to be uniformly bounded and if they are, we do not need to know these bounds, and (d) adapt to the empirical variance of our estimators. In summary, our methods enable anytime-valid off-policy inference using adaptively collected contextual bandit data. △ Less

Submitted 15 August, 2024; v1 submitted 19 October, 2022; originally announced October 2022.

Comments: 43 pages, 6 figures

arXiv:2207.05849 [pdf, other]

Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces

Authors: Yinglun Zhu, Paul Mineiro

Abstract: Designing efficient general-purpose contextual bandit algorithms that work with large -- or even continuous -- action spaces would facilitate application to important scenarios such as information retrieval, recommendation systems, and continuous control. While obtaining standard regret guarantees can be hopeless, alternative regret notions have been proposed to tackle the large action setting. We… ▽ More Designing efficient general-purpose contextual bandit algorithms that work with large -- or even continuous -- action spaces would facilitate application to important scenarios such as information retrieval, recommendation systems, and continuous control. While obtaining standard regret guarantees can be hopeless, alternative regret notions have been proposed to tackle the large action setting. We propose a smooth regret notion for contextual bandits, which dominates previously proposed alternatives. We design a statistically and computationally efficient algorithm -- for the proposed smooth regret -- that works with general function approximation under standard supervised oracles. We also present an adaptive algorithm that automatically adapts to any smoothness level. Our algorithms can be used to recover the previous minimax/Pareto optimal guarantees under the standard regret, e.g., in bandit problems with multiple best arms and Lipschitz/H{ö}lder bandits. We conduct large-scale empirical evaluations demonstrating the efficacy of our proposed algorithms. △ Less

Submitted 12 July, 2022; originally announced July 2022.

Comments: To appear at ICML 2022

arXiv:2207.05836 [pdf, other]

Contextual Bandits with Large Action Spaces: Made Practical

Authors: Yinglun Zhu, Dylan J. Foster, John Langford, Paul Mineiro

Abstract: A central problem in sequential decision making is to develop algorithms that are practical and computationally efficient, yet support the use of flexible, general-purpose models. Focusing on the contextual bandit problem, recent progress provides provably efficient algorithms with strong empirical performance when the number of possible alternatives ("actions") is small, but guarantees for decisi… ▽ More A central problem in sequential decision making is to develop algorithms that are practical and computationally efficient, yet support the use of flexible, general-purpose models. Focusing on the contextual bandit problem, recent progress provides provably efficient algorithms with strong empirical performance when the number of possible alternatives ("actions") is small, but guarantees for decision making in large, continuous action spaces have remained elusive, leading to a significant gap between theory and practice. We present the first efficient, general-purpose algorithm for contextual bandits with continuous, linearly structured action spaces. Our algorithm makes use of computational oracles for (i) supervised learning, and (ii) optimization over the action space, and achieves sample complexity, runtime, and memory independent of the size of the action space. In addition, it is simple and practical. We perform a large-scale empirical evaluation, and show that our approach typically enjoys superior performance and efficiency compared to standard baselines. △ Less

Submitted 12 July, 2022; originally announced July 2022.

Comments: To appear at ICML 2022

arXiv:2206.08364 [pdf, other]

Interaction-Grounded Learning with Action-inclusive Feedback

Authors: Tengyang Xie, Akanksha Saran, Dylan J. Foster, Lekan Molu, Ida Momennejad, Nan Jiang, Paul Mineiro, John Langford

Abstract: Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies. The agent observes a context vector, takes an action, and receives a feedback vector, using this information to effectively optimize a policy with respect to a latent reward function. Prior analyzed approaches f… ▽ More Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies. The agent observes a context vector, takes an action, and receives a feedback vector, using this information to effectively optimize a policy with respect to a latent reward function. Prior analyzed approaches fail when the feedback vector contains the action, which significantly limits IGL's success in many potential scenarios such as Brain-computer interface (BCI) or Human-computer interface (HCI) applications. We address this by creating an algorithm and analysis which allows IGL to work even when the feedback vector contains the action, encoded in any fashion. We provide theoretical guarantees and large-scale experiments based on supervised datasets to demonstrate the effectiveness of the new approach. △ Less

Submitted 12 October, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

Comments: Published in NeurIPS 2022

arXiv:2106.06926 [pdf, other]

Bellman-consistent Pessimism for Offline Reinforcement Learning

Authors: Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal

Abstract: The use of pessimism, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in offline reinforcement learning. Despite the robustness it adds to the algorithm, overly pessimistic reasoning can be equally damaging in precluding the discovery of good policies, which is an issue for the popular bonus-based pessimism. In this paper, we introduce the notion of Bell… ▽ More The use of pessimism, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in offline reinforcement learning. Despite the robustness it adds to the algorithm, overly pessimistic reasoning can be equally damaging in precluding the discovery of good policies, which is an issue for the popular bonus-based pessimism. In this paper, we introduce the notion of Bellman-consistent pessimism for general function approximation: instead of calculating a point-wise lower bound for the value function, we implement pessimism at the initial state over the set of functions consistent with the Bellman equations. Our theoretical guarantees only require Bellman closedness as standard in the exploratory setting, in which case bonus-based pessimism fails to provide guarantees. Even in the special case of linear function approximation where stronger expressivity assumptions hold, our result improves upon a recent bonus-based approach by $\mathcal{O}(d)$ in its sample complexity when the action space is finite. Remarkably, our algorithms automatically adapt to the best bias-variance tradeoff in the hindsight, whereas most prior approaches require tuning extra hyperparameters a priori. △ Less

Submitted 23 October, 2023; v1 submitted 13 June, 2021; originally announced June 2021.

Comments: NeurIPS 2021 (Oral)

arXiv:2106.04887 [pdf, other]

Interaction-Grounded Learning

Authors: Tengyang Xie, John Langford, Paul Mineiro, Ida Momennejad

Abstract: Consider a prosthetic arm, learning to adapt to its user's control signals. We propose Interaction-Grounded Learning for this novel setting, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies. Such a problem evades common RL solutions which require an explicit reward. The learning agent observes a multidimensional context vec… ▽ More Consider a prosthetic arm, learning to adapt to its user's control signals. We propose Interaction-Grounded Learning for this novel setting, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies. Such a problem evades common RL solutions which require an explicit reward. The learning agent observes a multidimensional context vector, takes an action, and then observes a multidimensional feedback vector. This multidimensional feedback vector has no explicit reward information. In order to succeed, the algorithm must learn how to evaluate the feedback vector to discover a latent reward signal, with which it can ground its policies without supervision. We show that in an Interaction-Grounded Learning setting, with certain natural assumptions, a learner can discover the latent reward and ground its policy for successful interaction. We provide theoretical guarantees and a proof-of-concept empirical evaluation to demonstrate the effectiveness of our proposed approach. △ Less

Submitted 13 July, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

Comments: Published in ICML 2021

arXiv:2102.09540 [pdf, other]

Off-policy Confidence Sequences

Authors: Nikos Karampatziakis, Paul Mineiro, Aaditya Ramdas

Abstract: We develop confidence bounds that hold uniformly over time for off-policy evaluation in the contextual bandit setting. These confidence sequences are based on recent ideas from martingale analysis and are non-asymptotic, non-parametric, and valid at arbitrary stopping times. We provide algorithms for computing these confidence sequences that strike a good balance between computational and statisti… ▽ More We develop confidence bounds that hold uniformly over time for off-policy evaluation in the contextual bandit setting. These confidence sequences are based on recent ideas from martingale analysis and are non-asymptotic, non-parametric, and valid at arbitrary stopping times. We provide algorithms for computing these confidence sequences that strike a good balance between computational and statistical efficiency. We empirically demonstrate the tightness of our approach in terms of failure probability and width and apply it to the "gated deployment" problem of safely upgrading a production contextual bandit system. △ Less

Submitted 18 February, 2021; originally announced February 2021.

arXiv:1906.03323 [pdf, other]

Empirical Likelihood for Contextual Bandits

Authors: Nikos Karampatziakis, John Langford, Paul Mineiro

Abstract: We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting. To this end we apply empirical likelihood techniques to formulate our estimator and confidence interval as simple convex optimization problems. Using the lower bound of our confidence interval, we then propose an off-policy policy optimization algorithm that se… ▽ More We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting. To this end we apply empirical likelihood techniques to formulate our estimator and confidence interval as simple convex optimization problems. Using the lower bound of our confidence interval, we then propose an off-policy policy optimization algorithm that searches for policies with large reward lower bound. We empirically find that both our estimator and confidence interval improve over previous proposals in finite sample regimes. Finally, the policy optimization algorithm we propose outperforms a strong baseline system for learning from off-policy data. △ Less

Submitted 17 October, 2020; v1 submitted 7 June, 2019; originally announced June 2019.

Comments: Accepted at NeurIPS 2020

arXiv:1905.02219 [pdf, other]

Lessons from Contextual Bandit Learning in a Customer Support Bot

Authors: Nikos Karampatziakis, Sebastian Kochman, Jade Huang, Paul Mineiro, Kathy Osborne, Weizhu Chen

Abstract: In this work, we describe practical lessons we have learned from successfully using contextual bandits (CBs) to improve key business metrics of the Microsoft Virtual Agent for customer support. While our current use cases focus on single step einforcement learning (RL) and mostly in the domain of natural language processing and information retrieval we believe many of our findings are generally ap… ▽ More In this work, we describe practical lessons we have learned from successfully using contextual bandits (CBs) to improve key business metrics of the Microsoft Virtual Agent for customer support. While our current use cases focus on single step einforcement learning (RL) and mostly in the domain of natural language processing and information retrieval we believe many of our findings are generally applicable. Through this article, we highlight certain issues that RL practitioners may encounter in similar types of applications as well as offer practical solutions to these challenges. △ Less

Submitted 18 June, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

Comments: Reinforcement Learning for Real Life Workshop

arXiv:1807.06473 [pdf, other]

Contextual Memory Trees

Authors: Wen Sun, Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro

Abstract: We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size. It is designed to efficiently query for memories from that store, supporting logarithmic time insertion and retrieval operations. Hence CMT can be integrated into existing statistical learning algorithms as an augmented memory unit without substanti… ▽ More We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size. It is designed to efficiently query for memories from that store, supporting logarithmic time insertion and retrieval operations. Hence CMT can be integrated into existing statistical learning algorithms as an augmented memory unit without substantially increasing training and inference computation. Furthermore CMT operates as a reduction to classification, allowing it to benefit from advances in representation or architecture. We demonstrate the efficacy of CMT by augmenting existing multi-class and multi-label classification algorithms with CMT and observe statistical improvement. We also test CMT learning on several image-captioning tasks to demonstrate that it performs computationally better than a simple nearest neighbors memory system while benefitting from reward learning. △ Less

Submitted 2 June, 2019; v1 submitted 17 July, 2018; originally announced July 2018.

Comments: ICM 2019

arXiv:1606.04988 [pdf, other]

Logarithmic Time One-Against-Some

Authors: Hal Daume III, Nikos Karampatziakis, John Langford, Paul Mineiro

Abstract: We create a new online reduction of multiclass classification to binary classification for which training and prediction time scale logarithmically with the number of classes. Compared to previous approaches, we obtain substantially better statistical performance for two reasons: First, we prove a tighter and more complete boosting theorem, and second we translate the results more directly into an… ▽ More We create a new online reduction of multiclass classification to binary classification for which training and prediction time scale logarithmically with the number of classes. Compared to previous approaches, we obtain substantially better statistical performance for two reasons: First, we prove a tighter and more complete boosting theorem, and second we translate the results more directly into an algorithm. We show that several simple techniques give rise to an algorithm that can compete with one-against-all in both space and predictive power while offering exponential improvements in speed when the number of classes is large. △ Less

Submitted 30 November, 2016; v1 submitted 15 June, 2016; originally announced June 2016.

arXiv:1602.02181 [pdf, other]

Active Information Acquisition

Authors: He He, Paul Mineiro, Nikos Karampatziakis

Abstract: We propose a general framework for sequential and dynamic acquisition of useful information in order to solve a particular task. While our goal could in principle be tackled by general reinforcement learning, our particular setting is constrained enough to allow more efficient algorithms. In this paper, we work under the Learning to Search framework and show how to formulate the goal of finding a… ▽ More We propose a general framework for sequential and dynamic acquisition of useful information in order to solve a particular task. While our goal could in principle be tackled by general reinforcement learning, our particular setting is constrained enough to allow more efficient algorithms. In this paper, we work under the Learning to Search framework and show how to formulate the goal of finding a dynamic information acquisition policy in that framework. We apply our formulation on two tasks, sentiment analysis and image recognition, and show that the learned policies exhibit good statistical performance. As an emergent byproduct, the learned policies show a tendency to focus on the most prominent parts of each instance and give harder instances more attention without explicitly being trained to do so. △ Less

Submitted 5 February, 2016; originally announced February 2016.

arXiv:1511.03260 [pdf, ps, other]

A Hierarchical Spectral Method for Extreme Classification

Authors: Paul Mineiro, Nikos Karampatziakis

Abstract: Extreme classification problems are multiclass and multilabel classification problems where the number of outputs is so large that straightforward strategies are neither statistically nor computationally viable. One strategy for dealing with the computational burden is via a tree decomposition of the output space. While this typically leads to training and inference that scales sublinearly with th… ▽ More Extreme classification problems are multiclass and multilabel classification problems where the number of outputs is so large that straightforward strategies are neither statistically nor computationally viable. One strategy for dealing with the computational burden is via a tree decomposition of the output space. While this typically leads to training and inference that scales sublinearly with the number of outputs, it also results in reduced statistical performance. In this work, we identify two shortcomings of tree decomposition methods, and describe two heuristic mitigations. We compose these with an eigenvalue technique for constructing the tree. The end result is a computationally efficient algorithm that provides good statistical performance on several extreme data sets. △ Less

Submitted 3 February, 2016; v1 submitted 10 November, 2015; originally announced November 2015.

Comments: Reference implementation available at https://github.com/pmineiro/xlst

arXiv:1411.3409 [pdf, other]

A Randomized Algorithm for CCA

Authors: Paul Mineiro, Nikos Karampatziakis

Abstract: We present RandomizedCCA, a randomized algorithm for computing canonical analysis, suitable for large datasets stored either out of core or on a distributed file system. Accurate results can be obtained in as few as two data passes, which is relevant for distributed processing frameworks in which iteration is expensive (e.g., Hadoop). The strategy also provides an excellent initializer for standar… ▽ More We present RandomizedCCA, a randomized algorithm for computing canonical analysis, suitable for large datasets stored either out of core or on a distributed file system. Accurate results can be obtained in as few as two data passes, which is relevant for distributed processing frameworks in which iteration is expensive (e.g., Hadoop). The strategy also provides an excellent initializer for standard iterative solutions. △ Less

Submitted 12 November, 2014; originally announced November 2014.

arXiv:1408.2065 [pdf]

Normalized Online Learning

Authors: Stephane Ross, Paul Mineiro, John Langford

Abstract: We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has several useful effects: there is no need to pre-normalize data, the test-time and test-space complexity are reduced, and the algorithms are more robust. We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has several useful effects: there is no need to pre-normalize data, the test-time and test-space complexity are reduced, and the algorithms are more robust. △ Less

Submitted 9 August, 2014; originally announced August 2014.

Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

Report number: UAI-P-2013-PG-537-545

arXiv:1310.1934 [pdf, other]

Discriminative Features via Generalized Eigenvectors

Authors: Nikos Karampatziakis, Paul Mineiro

Abstract: Representing examples in a way that is compatible with the underlying classifier can greatly enhance the performance of a learning system. In this paper we investigate scalable techniques for inducing discriminative features by taking advantage of simple second order structure in the data. We focus on multiclass classification and show that features extracted from the generalized eigenvectors of t… ▽ More Representing examples in a way that is compatible with the underlying classifier can greatly enhance the performance of a learning system. In this paper we investigate scalable techniques for inducing discriminative features by taking advantage of simple second order structure in the data. We focus on multiclass classification and show that features extracted from the generalized eigenvectors of the class conditional second moments lead to classifiers with excellent empirical performance. Moreover, these features have attractive theoretical properties, such as inducing representations that are invariant to linear transformations of the input. We evaluate classifiers built from these features on three different tasks, obtaining state of the art results. △ Less

Submitted 7 October, 2013; originally announced October 2013.

arXiv:1306.1840 [pdf, other]

Loss-Proportional Subsampling for Subsequent ERM

Authors: Paul Mineiro, Nikos Karampatziakis

Abstract: We propose a sampling scheme suitable for reducing a data set prior to selecting a hypothesis with minimum empirical risk. The sampling only considers a subset of the ultimate (unknown) hypothesis set, but can nonetheless guarantee that the final excess risk will compare favorably with utilizing the entire original data set. We demonstrate the practical benefits of our approach on a large dataset… ▽ More We propose a sampling scheme suitable for reducing a data set prior to selecting a hypothesis with minimum empirical risk. The sampling only considers a subset of the ultimate (unknown) hypothesis set, but can nonetheless guarantee that the final excess risk will compare favorably with utilizing the entire original data set. We demonstrate the practical benefits of our approach on a large dataset which we subsample and subsequently fit with boosted trees. △ Less

Submitted 23 June, 2013; v1 submitted 7 June, 2013; originally announced June 2013.

Comments: Appears in the proceedings of the 30th International Conference on Machine Learning

arXiv:1305.6646 [pdf, other]

Normalized Online Learning

Authors: Stephane Ross, Paul Mineiro, John Langford

Abstract: We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has several useful effects: there is no need to pre-normalize data, the test-time and test-space complexity are reduced, and the algorithms are more robust. We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has several useful effects: there is no need to pre-normalize data, the test-time and test-space complexity are reduced, and the algorithms are more robust. △ Less

Submitted 28 May, 2013; originally announced May 2013.

Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

Showing 1–23 of 23 results for author: Mineiro, P