Skip to main content

Showing 1–23 of 23 results for author: Mineiro, P

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.10490  [pdf, other

    stat.ML cs.LG

    Active, anytime-valid risk controlling prediction sets

    Authors: Ziyu Xu, Nikos Karampatziakis, Paul Mineiro

    Abstract: Rigorously establishing the safety of black-box machine learning models concerning critical risk measures is important for providing guarantees about model behavior. Recently, Bates et. al. (JACM '24) introduced the notion of a risk controlling prediction set (RCPS) for producing prediction sets that are statistically guaranteed low risk from machine learning models. Our method extends this notion… ▽ More

    Submitted 31 October, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: 22 pages, 3 figures. Accepted to NeurIPS 2024

  2. arXiv:2405.20677  [pdf, other

    cs.LG stat.ML

    Provably Efficient Interactive-Grounded Learning with Personalized Reward

    Authors: Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro

    Abstract: Interactive-Grounded Learning (IGL) [Xie et al., 2021] is a powerful framework in which a learner aims at maximizing unobservable rewards through interacting with an environment and observing reward-dependent feedback on the taken actions. To deal with personalized rewards that are ubiquitous in applications such as recommendation systems, Maghakian et al. [2022] study a version of IGL with contex… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  3. arXiv:2302.14248  [pdf, other

    stat.ML cs.LG

    Time-uniform confidence bands for the CDF under nonstationarity

    Authors: Paul Mineiro, Steven R. Howard

    Abstract: Estimation of the complete distribution of a random variable is a useful primitive for both manual and automated decision making. This problem has received extensive attention in the i.i.d. setting, but the arbitrary data dependent setting remains largely unaddressed. Consistent with known impossibility results, we present computationally felicitous time-uniform and value-uniform bounds on the CDF… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  4. arXiv:2210.13573  [pdf, other

    stat.ML cs.LG

    Conditionally Risk-Averse Contextual Bandits

    Authors: Mónika Farsang, Paul Mineiro, Wangda Zhang

    Abstract: Contextual bandits with average-case statistical guarantees are inadequate in risk-averse situations because they might trade off degraded worst-case behaviour for better average performance. Designing a risk-averse contextual bandit is challenging because exploration is necessary but risk-aversion is sensitive to the entire distribution of rewards; nonetheless we exhibit the first risk-averse con… ▽ More

    Submitted 8 July, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

  5. arXiv:2210.11133  [pdf, other

    stat.ML cs.LG

    A lower confidence sequence for the changing mean of non-negative right heavy-tailed observations with bounded mean

    Authors: Paul Mineiro

    Abstract: A confidence sequence (CS) is an anytime-valid sequential inference primitive which produces an adapted sequence of sets for a predictable parameter sequence with a time-uniform coverage guarantee. This work constructs a non-parametric non-asymptotic lower CS for the running average conditional expectation whose slack converges to zero given non-negative right heavy-tailed observations with bounde… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: Reference implementation at https://github.com/microsoft/csrobust

  6. arXiv:2210.10768  [pdf, other

    stat.ME cs.LG math.ST stat.ML

    Anytime-valid off-policy inference for contextual bandits

    Authors: Ian Waudby-Smith, Lili Wu, Aaditya Ramdas, Nikos Karampatziakis, Paul Mineiro

    Abstract: Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in healthcare and the tech industry. They involve online learning algorithms that adaptively learn policies over time to map observed contexts $X_t$ to actions $A_t$ in an attempt to maximize stochastic rewards $R_t$. This adaptivity raises interesting but hard statistical inference questions, especially counte… ▽ More

    Submitted 15 August, 2024; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: 43 pages, 6 figures

  7. arXiv:2207.05849  [pdf, other

    cs.LG stat.ML

    Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces

    Authors: Yinglun Zhu, Paul Mineiro

    Abstract: Designing efficient general-purpose contextual bandit algorithms that work with large -- or even continuous -- action spaces would facilitate application to important scenarios such as information retrieval, recommendation systems, and continuous control. While obtaining standard regret guarantees can be hopeless, alternative regret notions have been proposed to tackle the large action setting. We… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: To appear at ICML 2022

  8. arXiv:2207.05836  [pdf, other

    cs.LG stat.ML

    Contextual Bandits with Large Action Spaces: Made Practical

    Authors: Yinglun Zhu, Dylan J. Foster, John Langford, Paul Mineiro

    Abstract: A central problem in sequential decision making is to develop algorithms that are practical and computationally efficient, yet support the use of flexible, general-purpose models. Focusing on the contextual bandit problem, recent progress provides provably efficient algorithms with strong empirical performance when the number of possible alternatives ("actions") is small, but guarantees for decisi… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: To appear at ICML 2022

  9. arXiv:2206.08364  [pdf, other

    cs.LG cs.AI cs.HC stat.ML

    Interaction-Grounded Learning with Action-inclusive Feedback

    Authors: Tengyang Xie, Akanksha Saran, Dylan J. Foster, Lekan Molu, Ida Momennejad, Nan Jiang, Paul Mineiro, John Langford

    Abstract: Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies. The agent observes a context vector, takes an action, and receives a feedback vector, using this information to effectively optimize a policy with respect to a latent reward function. Prior analyzed approaches f… ▽ More

    Submitted 12 October, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: Published in NeurIPS 2022

  10. arXiv:2106.06926  [pdf, other

    cs.LG cs.AI stat.ML

    Bellman-consistent Pessimism for Offline Reinforcement Learning

    Authors: Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal

    Abstract: The use of pessimism, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in offline reinforcement learning. Despite the robustness it adds to the algorithm, overly pessimistic reasoning can be equally damaging in precluding the discovery of good policies, which is an issue for the popular bonus-based pessimism. In this paper, we introduce the notion of Bell… ▽ More

    Submitted 23 October, 2023; v1 submitted 13 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 (Oral)

  11. arXiv:2106.04887  [pdf, other

    cs.LG cs.AI stat.ML

    Interaction-Grounded Learning

    Authors: Tengyang Xie, John Langford, Paul Mineiro, Ida Momennejad

    Abstract: Consider a prosthetic arm, learning to adapt to its user's control signals. We propose Interaction-Grounded Learning for this novel setting, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies. Such a problem evades common RL solutions which require an explicit reward. The learning agent observes a multidimensional context vec… ▽ More

    Submitted 13 July, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

    Comments: Published in ICML 2021

  12. arXiv:2102.09540  [pdf, other

    cs.LG math.ST stat.ML

    Off-policy Confidence Sequences

    Authors: Nikos Karampatziakis, Paul Mineiro, Aaditya Ramdas

    Abstract: We develop confidence bounds that hold uniformly over time for off-policy evaluation in the contextual bandit setting. These confidence sequences are based on recent ideas from martingale analysis and are non-asymptotic, non-parametric, and valid at arbitrary stopping times. We provide algorithms for computing these confidence sequences that strike a good balance between computational and statisti… ▽ More

    Submitted 18 February, 2021; originally announced February 2021.

  13. arXiv:1906.03323  [pdf, other

    cs.LG stat.ML

    Empirical Likelihood for Contextual Bandits

    Authors: Nikos Karampatziakis, John Langford, Paul Mineiro

    Abstract: We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting. To this end we apply empirical likelihood techniques to formulate our estimator and confidence interval as simple convex optimization problems. Using the lower bound of our confidence interval, we then propose an off-policy policy optimization algorithm that se… ▽ More

    Submitted 17 October, 2020; v1 submitted 7 June, 2019; originally announced June 2019.

    Comments: Accepted at NeurIPS 2020

  14. arXiv:1905.02219  [pdf, other

    cs.LG stat.ML

    Lessons from Contextual Bandit Learning in a Customer Support Bot

    Authors: Nikos Karampatziakis, Sebastian Kochman, Jade Huang, Paul Mineiro, Kathy Osborne, Weizhu Chen

    Abstract: In this work, we describe practical lessons we have learned from successfully using contextual bandits (CBs) to improve key business metrics of the Microsoft Virtual Agent for customer support. While our current use cases focus on single step einforcement learning (RL) and mostly in the domain of natural language processing and information retrieval we believe many of our findings are generally ap… ▽ More

    Submitted 18 June, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

    Comments: Reinforcement Learning for Real Life Workshop

  15. arXiv:1807.06473  [pdf, other

    cs.LG stat.ML

    Contextual Memory Trees

    Authors: Wen Sun, Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro

    Abstract: We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size. It is designed to efficiently query for memories from that store, supporting logarithmic time insertion and retrieval operations. Hence CMT can be integrated into existing statistical learning algorithms as an augmented memory unit without substanti… ▽ More

    Submitted 2 June, 2019; v1 submitted 17 July, 2018; originally announced July 2018.

    Comments: ICM 2019

  16. arXiv:1606.04988  [pdf, other

    stat.ML cs.LG

    Logarithmic Time One-Against-Some

    Authors: Hal Daume III, Nikos Karampatziakis, John Langford, Paul Mineiro

    Abstract: We create a new online reduction of multiclass classification to binary classification for which training and prediction time scale logarithmically with the number of classes. Compared to previous approaches, we obtain substantially better statistical performance for two reasons: First, we prove a tighter and more complete boosting theorem, and second we translate the results more directly into an… ▽ More

    Submitted 30 November, 2016; v1 submitted 15 June, 2016; originally announced June 2016.

  17. arXiv:1602.02181  [pdf, other

    stat.ML cs.LG

    Active Information Acquisition

    Authors: He He, Paul Mineiro, Nikos Karampatziakis

    Abstract: We propose a general framework for sequential and dynamic acquisition of useful information in order to solve a particular task. While our goal could in principle be tackled by general reinforcement learning, our particular setting is constrained enough to allow more efficient algorithms. In this paper, we work under the Learning to Search framework and show how to formulate the goal of finding a… ▽ More

    Submitted 5 February, 2016; originally announced February 2016.

  18. arXiv:1511.03260  [pdf, ps, other

    stat.ML cs.LG

    A Hierarchical Spectral Method for Extreme Classification

    Authors: Paul Mineiro, Nikos Karampatziakis

    Abstract: Extreme classification problems are multiclass and multilabel classification problems where the number of outputs is so large that straightforward strategies are neither statistically nor computationally viable. One strategy for dealing with the computational burden is via a tree decomposition of the output space. While this typically leads to training and inference that scales sublinearly with th… ▽ More

    Submitted 3 February, 2016; v1 submitted 10 November, 2015; originally announced November 2015.

    Comments: Reference implementation available at https://github.com/pmineiro/xlst

  19. arXiv:1411.3409  [pdf, other

    stat.ML cs.LG

    A Randomized Algorithm for CCA

    Authors: Paul Mineiro, Nikos Karampatziakis

    Abstract: We present RandomizedCCA, a randomized algorithm for computing canonical analysis, suitable for large datasets stored either out of core or on a distributed file system. Accurate results can be obtained in as few as two data passes, which is relevant for distributed processing frameworks in which iteration is expensive (e.g., Hadoop). The strategy also provides an excellent initializer for standar… ▽ More

    Submitted 12 November, 2014; originally announced November 2014.

  20. arXiv:1408.2065  [pdf

    cs.LG stat.ML

    Normalized Online Learning

    Authors: Stephane Ross, Paul Mineiro, John Langford

    Abstract: We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has several useful effects: there is no need to pre-normalize data, the test-time and test-space complexity are reduced, and the algorithms are more robust.

    Submitted 9 August, 2014; originally announced August 2014.

    Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

    Report number: UAI-P-2013-PG-537-545

  21. arXiv:1310.1934  [pdf, other

    cs.LG stat.ML

    Discriminative Features via Generalized Eigenvectors

    Authors: Nikos Karampatziakis, Paul Mineiro

    Abstract: Representing examples in a way that is compatible with the underlying classifier can greatly enhance the performance of a learning system. In this paper we investigate scalable techniques for inducing discriminative features by taking advantage of simple second order structure in the data. We focus on multiclass classification and show that features extracted from the generalized eigenvectors of t… ▽ More

    Submitted 7 October, 2013; originally announced October 2013.

  22. arXiv:1306.1840  [pdf, other

    cs.LG stat.ML

    Loss-Proportional Subsampling for Subsequent ERM

    Authors: Paul Mineiro, Nikos Karampatziakis

    Abstract: We propose a sampling scheme suitable for reducing a data set prior to selecting a hypothesis with minimum empirical risk. The sampling only considers a subset of the ultimate (unknown) hypothesis set, but can nonetheless guarantee that the final excess risk will compare favorably with utilizing the entire original data set. We demonstrate the practical benefits of our approach on a large dataset… ▽ More

    Submitted 23 June, 2013; v1 submitted 7 June, 2013; originally announced June 2013.

    Comments: Appears in the proceedings of the 30th International Conference on Machine Learning

  23. arXiv:1305.6646  [pdf, other

    cs.LG stat.ML

    Normalized Online Learning

    Authors: Stephane Ross, Paul Mineiro, John Langford

    Abstract: We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has several useful effects: there is no need to pre-normalize data, the test-time and test-space complexity are reduced, and the algorithms are more robust.

    Submitted 28 May, 2013; originally announced May 2013.

    Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)