Skip to main content

Showing 1–50 of 133 results for author: Kallus, N

.
  1. arXiv:2506.03324  [pdf, ps, other

    cs.LG

    Optimization of Epsilon-Greedy Exploration

    Authors: Ethan Che, Hakan Ceylan, James McInerney, Nathan Kallus

    Abstract: Modern recommendation systems rely on exploration to learn user preferences for new items, typically implementing uniform exploration policies (e.g., epsilon-greedy) due to their simplicity and compatibility with machine learning (ML) personalization models. Within these systems, a crucial consideration is the rate of exploration - what fraction of user traffic should receive random item recommend… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  2. arXiv:2506.02881  [pdf, ps, other

    stat.ME cs.LG stat.ML

    Simulation-Based Inference for Adaptive Experiments

    Authors: Brian M Cho, Aurélien Bibaut, Nathan Kallus

    Abstract: Multi-arm bandit experimental designs are increasingly being adopted over standard randomized trials due to their potential to improve outcomes for study participants, enable faster identification of the best-performing options, and/or enhance the precision of estimating key parameters. Current approaches for inference after adaptive sampling either rely on asymptotic normality under restricted ex… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  3. arXiv:2505.17468  [pdf, ps, other

    stat.ME cs.LG stat.ML

    Efficient Adaptive Experimentation with Non-Compliance

    Authors: Miruna Oprescu, Brian M Cho, Nathan Kallus

    Abstract: We study the problem of estimating the average treatment effect (ATE) in adaptive experiments where treatment can only be encouraged--rather than directly assigned--via a binary instrumental variable. Building on semiparametric efficiency theory, we derive the efficiency bound for ATE estimation under arbitrary, history-dependent instrument-assignment policies, and show it is minimized by a varian… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 26 pages, 3 figures

  4. arXiv:2505.17373  [pdf, other

    cs.LG cs.AI cs.CL

    Value-Guided Search for Efficient Chain-of-Thought Reasoning

    Authors: Kaiwen Wang, Jin Peng Zhou, Jonathan Chang, Zhaolin Gao, Nathan Kallus, Kianté Brantley, Wen Sun

    Abstract: In this paper, we propose a simple and efficient method for value model training on long-context reasoning traces. Compared to existing process reward models (PRMs), our method does not require a fine-grained notion of "step," which is difficult to define for long-context reasoning models. By collecting a dataset of 2.5 million reasoning traces, we train a 1.5B token-level value model and apply it… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  5. arXiv:2505.07729  [pdf, ps, other

    stat.ME math.ST stat.ML

    Nonparametric Instrumental Variable Inference with Many Weak Instruments

    Authors: Lars van der Laan, Nathan Kallus, Aurélien Bibaut

    Abstract: We study inference on linear functionals in the nonparametric instrumental variable (NPIV) problem with a discretely-valued instrument under a many-weak-instruments asymptotic regime, where the number of instrument values grows with the sample size. A key motivating example is estimating long-term causal effects in a new experiment with only short-term outcomes, using past experiments to instrumen… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  6. arXiv:2504.15476  [pdf, other

    cs.IR

    From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System

    Authors: Rohan Surana, Junda Wu, Zhouhang Xie, Yu Xia, Harald Steck, Dawen Liang, Nathan Kallus, Julian McAuley

    Abstract: Conversational recommender systems (CRS) typically require extensive domain-specific conversational datasets, yet high costs, privacy concerns, and data-collection challenges severely limit their availability. Although Large Language Models (LLMs) demonstrate strong zero-shot recommendation capabilities, practical applications often favor smaller, internally managed recommender models due to scala… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 11 pages, 2 figures

  7. arXiv:2503.12760  [pdf, other

    stat.ML cs.LG econ.EM

    SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement

    Authors: Brian Cho, Ana-Roxana Pop, Ariel Evnine, Nathan Kallus

    Abstract: To design effective digital interventions, experimenters face the challenge of learning decision policies that balance multiple objectives using offline data. Often, they aim to develop policies that maximize goal outcomes, while ensuring there are no undesirable changes in guardrail outcomes. To provide credible recommendations, experimenters must not only identify policies that satisfy the desir… ▽ More

    Submitted 21 March, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

  8. arXiv:2502.20548  [pdf, other

    cs.LG cs.AI cs.CL

    $Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training

    Authors: Jin Peng Zhou, Kaiwen Wang, Jonathan Chang, Zhaolin Gao, Nathan Kallus, Kilian Q. Weinberger, Kianté Brantley, Wen Sun

    Abstract: Reinforcement learning (RL) post-training is crucial for LLM alignment and reasoning, but existing policy-based methods, such as PPO and DPO, can fall short of fixing shortcuts inherited from pre-training. In this work, we introduce $Q\sharp$, a value-based algorithm for KL-regularized RL that guides the reference policy using the optimal regularized $Q$ function. We propose to learn the optimal… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  9. arXiv:2502.14137  [pdf, other

    cs.IR

    Collaborative Retrieval for Large Language Model-based Conversational Recommender Systems

    Authors: Yaochen Zhu, Chao Wan, Harald Steck, Dawen Liang, Yesu Feng, Nathan Kallus, Jundong Li

    Abstract: Conversational recommender systems (CRS) aim to provide personalized recommendations via interactive dialogues with users. While large language models (LLMs) enhance CRS with their superior understanding of context-aware user preferences, they typically struggle to leverage behavioral data, which have proven to be important for classical collaborative filtering (CF)-based approaches. For this reas… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: Accepted by WWW'2025

  10. Evaluating Decision Rules Across Many Weak Experiments

    Authors: Winston Chou, Colin Gray, Nathan Kallus, Aurélien Bibaut, Simon Ejdemyr

    Abstract: Technology firms conduct randomized controlled experiments ("A/B tests") to learn which actions to take to improve business outcomes. In firms with mature experimentation platforms, experimentation programs can consist of many thousands of tests. To effectively scale experimentation, firms rely on decision rules: standard operating procedures for mapping the results of an experiment to a choice of… ▽ More

    Submitted 29 May, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

    Comments: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '25), August 3--7, 2025, Toronto, ON, Canada

  11. arXiv:2502.05295  [pdf, other

    cs.LG stat.ME

    GST-UNet: Spatiotemporal Causal Inference with Time-Varying Confounders

    Authors: Miruna Oprescu, David K. Park, Xihaier Luo, Shinjae Yoo, Nathan Kallus

    Abstract: Estimating causal effects from spatiotemporal data is a key challenge in fields such as public health, social policy, and environmental science, where controlled experiments are often infeasible. However, existing causal inference methods relying on observational data face significant limitations: they depend on strong structural assumptions to address spatiotemporal challenges $\unicode{x2013}$ s… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Comments: 17 pages, 6 figures, 2 tables

  12. arXiv:2501.11868  [pdf, other

    stat.ME math.ST stat.ML

    Automatic Debiased Machine Learning for Smooth Functionals of Nonparametric M-Estimands

    Authors: Lars van der Laan, Aurelien Bibaut, Nathan Kallus, Alex Luedtke

    Abstract: We propose a unified framework for automatic debiased machine learning (autoDML) to perform inference on smooth functionals of infinite-dimensional M-estimands, defined as population risk minimizers over Hilbert spaces. By automating debiased estimation and inference procedures in causal inference and semiparametric statistics, our framework enables practitioners to construct valid estimators for… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

  13. arXiv:2501.06926  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Semiparametric Double Reinforcement Learning with Applications to Long-Term Causal Inference

    Authors: Lars van der Laan, David Hubbard, Allen Tran, Nathan Kallus, Aurélien Bibaut

    Abstract: Long-term causal effects often must be estimated from short-term data due to limited follow-up in healthcare, economics, and online platforms. Markov Decision Processes (MDPs) provide a natural framework for capturing such long-term dynamics through sequences of states, actions, and rewards. Double Reinforcement Learning (DRL) enables efficient inference on policy values in MDPs, but nonparametric… ▽ More

    Submitted 30 June, 2025; v1 submitted 12 January, 2025; originally announced January 2025.

  14. arXiv:2410.15564  [pdf, other

    cs.LG stat.ME stat.ML

    Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits

    Authors: Brian Cho, Dominik Meier, Kyra Gan, Nathan Kallus

    Abstract: In multi-armed bandits, the tasks of reward maximization and pure exploration are often at odds with each other. The former focuses on exploiting arms with the highest means, while the latter may require constant exploration across all arms. In this work, we focus on good arm identification (GAI), a practical bandit inference objective that aims to label arms with means above a threshold as quickl… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

  15. arXiv:2410.09282  [pdf, other

    stat.ME math.ST

    Anytime-Valid Continuous-Time Confidence Processes for Inhomogeneous Poisson Processes

    Authors: Michael Lindon, Nathan Kallus

    Abstract: Motivated by monitoring the arrival of incoming adverse events such as customer support calls or crash reports from users exposed to an experimental product change, we consider sequential hypothesis testing of continuous-time inhomogeneous Poisson point processes. Specifically, we provide an interval-valued confidence process $C^α(t)$ over continuous time $t$ for the cumulative arrival rate… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

  16. arXiv:2409.17466  [pdf, other

    stat.ML cs.AI cs.LG

    Adjusting Regression Models for Conditional Uncertainty Calibration

    Authors: Ruijiang Gao, Mingzhang Yin, James McInerney, Nathan Kallus

    Abstract: Conformal Prediction methods have finite-sample distribution-free marginal coverage guarantees. However, they generally do not offer conditional coverage guarantees, which can be important for high-stakes decisions. In this paper, we propose a novel algorithm to train a regression function to improve the conditional coverage after applying the split conformal prediction procedure. We establish an… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: Machine Learning Special Issue on Uncertainty Quantification

  17. arXiv:2409.12799  [pdf, ps, other

    stat.ML cs.LG math.ST

    The Central Role of the Loss Function in Reinforcement Learning

    Authors: Kaiwen Wang, Nathan Kallus, Wen Sun

    Abstract: This paper illustrates the central role of loss functions in data-driven decision making, providing a comprehensive survey on their influence in cost-sensitive classification (CSC) and reinforcement learning (RL). We demonstrate how different regression loss functions affect the sample efficiency and adaptivity of value-based decision making algorithms. Across multiple settings, we prove that algo… ▽ More

    Submitted 4 April, 2025; v1 submitted 19 September, 2024; originally announced September 2024.

    Comments: Accepted to Statistical Science

  18. arXiv:2408.12004  [pdf, other

    cs.LG stat.ME stat.ML

    CSPI-MT: Calibrated Safe Policy Improvement with Multiple Testing for Threshold Policies

    Authors: Brian M Cho, Ana-Roxana Pop, Kyra Gan, Sam Corbett-Davies, Israel Nir, Ariel Evnine, Nathan Kallus

    Abstract: When modifying existing policies in high-risk settings, it is often necessary to ensure with high certainty that the newly proposed policy improves upon a baseline, such as the status quo. In this work, we consider the problem of safe policy improvement, where one only adopts a new policy if it is deemed to be better than the specified baseline with at least pre-specified probability. We focus on… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  19. arXiv:2406.14140  [pdf, other

    math.ST

    Nonparametric Jackknife Instrumental Variable Estimation and Confounding Robust Surrogate Indices

    Authors: Aurélien Bibaut, Nathan Kallus, Apoorva Lal

    Abstract: Jackknife instrumental variable estimation (JIVE) is a classic method to leverage many weak instrumental variables (IVs) to estimate linear structural models, overcoming the bias of standard methods like two-stage least squares. In this paper, we extend the jackknife approach to nonparametric IV (NPIV) models with many weak IVs. Since NPIV characterizes the structural regression as having residual… ▽ More

    Submitted 7 October, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  20. arXiv:2406.06452  [pdf, other

    stat.ME cs.LG stat.ML

    Estimating Heterogeneous Treatment Effects by Combining Weak Instruments and Observational Data

    Authors: Miruna Oprescu, Nathan Kallus

    Abstract: Accurately predicting conditional average treatment effects (CATEs) is crucial in personalized medicine and digital platform analytics. Since the treatments of interest often cannot be directly randomized, observational data is leveraged to learn CATEs, but this approach can incur significant bias from unobserved confounding. One strategy to overcome these limitations is to leverage instrumental v… ▽ More

    Submitted 1 November, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 30 pages, 4 figures, NeurIPS 2024

  21. arXiv:2405.16564  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Contextual Linear Optimization with Bandit Feedback

    Authors: Yichun Hu, Nathan Kallus, Xiaojie Mao, Yanchen Wu

    Abstract: Contextual linear optimization (CLO) uses predictive contextual features to reduce uncertainty in random cost coefficients and thereby improve average-cost performance. An example is the stochastic shortest path problem with random edge costs (e.g., traffic) and contextual features (e.g., lagged traffic, weather). Existing work on CLO assumes the data has fully observed cost coefficient vectors, b… ▽ More

    Submitted 17 October, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

  22. arXiv:2405.12119  [pdf, other

    cs.IR cs.AI cs.CL

    Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation

    Authors: Zhankui He, Zhouhang Xie, Harald Steck, Dawen Liang, Rahul Jha, Nathan Kallus, Julian McAuley

    Abstract: Large language models (LLMs) are revolutionizing conversational recommender systems by adeptly indexing item content, understanding complex conversational contexts, and generating relevant item titles. However, controlling the distribution of recommended items remains a challenge. This leads to suboptimal performance due to the failure to capture rapidly changing data distributions, such as item p… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  23. arXiv:2405.01281  [pdf, ps, other

    stat.ME econ.EM math.ST stat.ML

    Demistifying Inference after Adaptive Experiments

    Authors: Aurélien Bibaut, Nathan Kallus

    Abstract: Adaptive experiments such as multi-arm bandits adapt the treatment-allocation policy and/or the decision to stop the experiment to the data observed so far. This has the potential to improve outcomes for study participants within the experiment, to improve the chance of identifying best treatments after the experiment, and to avoid wasting data. Seen as an experiment (rather than just a continuall… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  24. arXiv:2404.00099  [pdf, other

    cs.AI stat.ML

    Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes

    Authors: Andrew Bennett, Nathan Kallus, Miruna Oprescu, Wen Sun, Kaiwen Wang

    Abstract: We study the evaluation of a policy under best- and worst-case perturbations to a Markov decision process (MDP), using transition observations from the original MDP, whether they are generated under the same or a different policy. This is an important problem when there is the possibility of a shift between historical and future environments, $\textit{e.g.}$ due to unmeasured confounding, distribu… ▽ More

    Submitted 1 November, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

    Comments: 39 pages, 2 figures, NeurIPS 2024

  25. arXiv:2403.10671  [pdf, other

    stat.ML cs.LG

    Variation Due to Regularization Tractably Recovers Bayesian Deep Learning

    Authors: James McInerney, Nathan Kallus

    Abstract: Uncertainty quantification in deep learning is crucial for safe and reliable decision-making in downstream tasks. Existing methods quantify uncertainty at the last layer or other approximations of the network which may miss some sources of uncertainty in the model. To address this gap, we propose an uncertainty quantification method for large networks based on variation due to regularization. Esse… ▽ More

    Submitted 24 April, 2025; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: 16 pages, 9 figures

  26. arXiv:2403.06323  [pdf, other

    cs.LG

    A Reductions Approach to Risk-Sensitive Reinforcement Learning with Optimized Certainty Equivalents

    Authors: Kaiwen Wang, Dawen Liang, Nathan Kallus, Wen Sun

    Abstract: We study risk-sensitive RL where the goal is learn a history-dependent policy that optimizes some risk measure of cumulative rewards. We consider a family of risks called the optimized certainty equivalents (OCE), which captures important risk measures such as conditional value-at-risk (CVaR), entropic risk and Markowitz's mean-variance. In this setting, we propose two meta-algorithms: one grounde… ▽ More

    Submitted 27 February, 2025; v1 submitted 10 March, 2024; originally announced March 2024.

  27. Is Cosine-Similarity of Embeddings Really About Similarity?

    Authors: Harald Steck, Chaitanya Ekanadham, Nathan Kallus

    Abstract: Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 9 pages

    Journal ref: ACM Web Conference 2024 (WWW 2024 Companion)

  28. arXiv:2403.05385  [pdf, other

    cs.LG

    Switching the Loss Reduces the Cost in Batch (Offline) Reinforcement Learning

    Authors: Alex Ayoub, Kaiwen Wang, Vincent Liu, Samuel Robertson, James McInerney, Dawen Liang, Nathan Kallus, Csaba Szepesvári

    Abstract: We propose training fitted Q-iteration with log-loss (FQI-log) for batch reinforcement learning (RL). We show that the number of samples needed to learn a near-optimal policy with FQI-log scales with the accumulated cost of the optimal policy, which is zero in problems where acting optimally achieves the goal and incurs no cost. In doing so, we provide a general framework for proving small-cost bo… ▽ More

    Submitted 1 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  29. arXiv:2403.02467  [pdf

    econ.EM cs.LG stat.ME stat.ML

    Applied Causal Inference Powered by ML and AI

    Authors: Victor Chernozhukov, Christian Hansen, Nathan Kallus, Martin Spindler, Vasilis Syrgkanis

    Abstract: An introduction to the emerging fusion of machine learning and causal inference. The book presents ideas from classical structural equation models (SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and structural causal models (SCMs), and covers Double/Debiased Machine Learning methods to do inference in such models using modern predictive tools.

    Submitted 4 March, 2024; originally announced March 2024.

  30. arXiv:2402.17637  [pdf, other

    stat.ME

    Learning the Covariance of Treatment Effects Across Many Weak Experiments

    Authors: Aurélien Bibaut, Winston Chou, Simon Ejdemyr, Nathan Kallus

    Abstract: When primary objectives are insensitive or delayed, experimenters may instead focus on proxy metrics derived from secondary outcomes. For example, technology companies often infer the long-term impacts of product interventions from their effects on short-term user engagement signals. We consider the meta-analysis of many historical experiments to learn the covariance of treatment effects on these… ▽ More

    Submitted 30 July, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

  31. arXiv:2402.07198  [pdf, other

    cs.LG

    More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning

    Authors: Kaiwen Wang, Owen Oertell, Alekh Agarwal, Nathan Kallus, Wen Sun

    Abstract: In this paper, we prove that Distributional Reinforcement Learning (DistRL), which learns the return distribution, can obtain second-order bounds in both online and offline RL in general settings with function approximation. Second-order bounds are instance-dependent bounds that scale with the variance of return, which we prove are tighter than the previously known small-loss bounds of distributio… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  32. arXiv:2402.06122  [pdf, other

    stat.ME cs.LG stat.ML

    Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams

    Authors: Brian Cho, Kyra Gan, Nathan Kallus

    Abstract: We propose a novel nonparametric sequential test for composite hypotheses for means of multiple data streams. Our proposed method, \emph{peeking with expectation-based averaged capital} (PEAK), builds upon the testing-by-betting framework and provides a non-asymptotic $α$-level test across any stopping time. Our contributions are two-fold: (1) we propose a novel betting scheme and provide theoreti… ▽ More

    Submitted 2 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: To appear at the Forty-first International Conference on Machine Learning (ICML 2024)

  33. arXiv:2402.01845  [pdf, other

    cs.LG stat.ML

    Multi-Armed Bandits with Interference

    Authors: Su Jia, Peter Frazier, Nathan Kallus

    Abstract: Experimentation with interference poses a significant challenge in contemporary online platforms. Prior research on experimentation with interference has concentrated on the final output of a policy. The cumulative performance, while equally crucial, is less well understood. To address this gap, we introduce the problem of {\em Multi-armed Bandits with Interference} (MABI), where the learner assig… ▽ More

    Submitted 15 July, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  34. arXiv:2312.15574  [pdf, other

    math.ST cs.LG

    Clustered Switchback Designs for Experimentation Under Spatio-temporal Interference

    Authors: Su Jia, Nathan Kallus, Christina Lee Yu

    Abstract: We consider experimentation in the presence of non-stationarity, inter-unit (spatial) interference, and carry-over effects (temporal interference), where we wish to estimate the global average treatment effect (GATE), the difference between average outcomes having exposed all units at all times to treatment or to control. We suppose spatial interference is described by a graph, where a unit's outc… ▽ More

    Submitted 26 March, 2025; v1 submitted 24 December, 2023; originally announced December 2023.

  35. arXiv:2311.11922  [pdf, other

    stat.AP stat.ME

    Evaluating the Surrogate Index as a Decision-Making Tool Using 200 A/B Tests at Netflix

    Authors: Vickie Zhang, Michael Zhao, and Maria Dimakopoulou, Anh Le, Nathan Kallus

    Abstract: Surrogate index approaches have recently become a popular method of estimating longer-term impact from shorter-term outcomes. In this paper, we leverage 1098 test arms from 200 A/B tests at Netflix to empirically investigate to what degree would decisions made using a surrogate index utilizing 14 days of data would align with those made using direct measurement of day 63 treatment effects. Focusin… ▽ More

    Submitted 30 January, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

  36. arXiv:2311.08527  [pdf, other

    stat.AP stat.ME

    Inferring the Long-Term Causal Effects of Long-Term Treatments from Short-Term Experiments

    Authors: Allen Tran, Aurélien Bibaut, Nathan Kallus

    Abstract: We study inference on the long-term causal effect of a continual exposure to a novel intervention, which we term a long-term treatment, based on an experiment involving only short-term observations. Key examples include the long-term health effects of regularly-taken medicine or of environmental hazards and the long-term effects on users of changes to an online platform. This stands in contrast to… ▽ More

    Submitted 4 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Accepted into ICML 2024 - typos etc and extended literature review

  37. arXiv:2311.04657  [pdf, other

    stat.ME math.ST

    Long-Term Causal Inference with Imperfect Surrogates using Many Weak Experiments, Proxies, and Cross-Fold Moments

    Authors: Aurélien Bibaut, Nathan Kallus, Simon Ejdemyr, Michael Zhao

    Abstract: Inferring causal effects on long-term outcomes using short-term surrogates is crucial to rapid innovation. However, even when treatments are randomized and surrogates fully mediate their effect on outcomes, it's possible that we get the direction of causal effects wrong due to confounding between surrogates and outcomes -- a situation famously known as the surrogate paradox. The availability of ma… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  38. arXiv:2311.03564  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Low-Rank MDPs with Continuous Action Spaces

    Authors: Andrew Bennett, Nathan Kallus, Miruna Oprescu

    Abstract: Low-Rank Markov Decision Processes (MDPs) have recently emerged as a promising framework within the domain of reinforcement learning (RL), as they allow for provably approximately correct (PAC) learning guarantees while also incorporating ML algorithms for representation learning. However, current methods for low-rank MDPs are limited in that they only consider finite action spaces, and give vacuo… ▽ More

    Submitted 1 April, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: 25 pages, AISTATS 2024

    Journal ref: PMLR, Volume 238, 2024

  39. arXiv:2310.15433  [pdf, other

    cs.LG cs.IR

    Off-Policy Evaluation for Large Action Spaces via Policy Convolution

    Authors: Noveen Sachdeva, Lequn Wang, Dawen Liang, Nathan Kallus, Julian McAuley

    Abstract: Developing accurate off-policy estimators is crucial for both evaluating and optimizing for new policies. The main challenge in off-policy estimation is the distribution shift between the logging policy that generates data and the target policy that we aim to evaluate. Typically, techniques for correcting distribution shift involve some form of importance sampling. This approach results in unbiase… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Under review. 36 pages, 31 figures

  40. Large Language Models as Zero-Shot Conversational Recommenders

    Authors: Zhankui He, Zhouhang Xie, Rahul Jha, Harald Steck, Dawen Liang, Yesu Feng, Bodhisattwa Prasad Majumder, Nathan Kallus, Julian McAuley

    Abstract: In this paper, we present empirical studies on conversational recommendation tasks using representative large language models in a zero-shot setting with three primary contributions. (1) Data: To gain insights into model behavior in "in-the-wild" conversational recommendation scenarios, we construct a new dataset of recommendation-related conversations by scraping a popular discussion website. Thi… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: Accepted as CIKM 2023 long paper. Longer version is coming soon (e.g., more details about dataset)

  41. arXiv:2307.13793  [pdf, ps, other

    stat.ME cs.LG econ.EM math.ST stat.ML

    Source Condition Double Robust Inference on Functionals of Inverse Problems

    Authors: Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

    Abstract: We consider estimation of parameters defined as linear functionals of solutions to linear inverse problems. Any such parameter admits a doubly robust representation that depends on the solution to a dual linear inverse problem, where the dual solution can be thought as a generalization of the inverse propensity function. We provide the first source condition double robust inference method that ens… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  42. arXiv:2307.11704  [pdf, other

    cs.LG

    JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning

    Authors: Kaiwen Wang, Junxiong Wang, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun

    Abstract: Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost and it is the core NP-hard combinatorial optimization problem of query optimization. In this paper, we present JoinGym, a lightweight and easy-to-use query optimization environment for reinforcement learning (RL) that captures both the left-deep and bushy variants of the JOS problem. Compar… ▽ More

    Submitted 17 October, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

    Comments: JoinGym is available at https://github.com/kaiwenw/JoinGym!

  43. arXiv:2305.15703  [pdf, ps, other

    cs.LG cs.AI math.OC math.ST stat.ML

    The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning

    Authors: Kaiwen Wang, Kevin Zhou, Runzhe Wu, Nathan Kallus, Wen Sun

    Abstract: While distributional reinforcement learning (DistRL) has been empirically effective, the question of when and why it is better than vanilla, non-distributional RL has remained unanswered. This paper explains the benefits of DistRL through the lens of small-loss bounds, which are instance-dependent bounds that scale with optimal achievable cost. Particularly, our bounds converge much faster than th… ▽ More

    Submitted 22 September, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted at NeurIPS 2023

  44. arXiv:2305.14816  [pdf, ps, other

    cs.LG math.ST stat.ML

    Provable Offline Preference-Based Reinforcement Learning

    Authors: Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun

    Abstract: In this paper, we investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback where feedback is available in the form of preference between trajectory pairs rather than explicit rewards. Our proposed algorithm consists of two main steps: (1) estimate the implicit reward using Maximum Likelihood Estimation (MLE) with general function approximation from offl… ▽ More

    Submitted 29 September, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: The first two authors contribute equally

  45. arXiv:2304.10577  [pdf, other

    cs.LG stat.ML

    B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding

    Authors: Miruna Oprescu, Jacob Dorn, Marah Ghoummaid, Andrew Jesson, Nathan Kallus, Uri Shalit

    Abstract: Estimating heterogeneous treatment effects from observational data is a crucial task across many fields, helping policy and decision-makers take better actions. There has been recent progress on robust and efficient methods for estimating the conditional average treatment effect (CATE) function, but these methods often do not take into account the risk of hidden confounding, which could arbitraril… ▽ More

    Submitted 13 June, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: 20 pages, 4 figures, ICML 2023

    Journal ref: PMLR 202 (2023) 26599-26618

  46. arXiv:2302.05404  [pdf, ps, other

    stat.ML cs.LG econ.EM math.ST stat.ME

    Minimax Instrumental Variable Regression and $L_2$ Convergence Guarantees without Identification or Closedness

    Authors: Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

    Abstract: In this paper, we study nonparametric estimation of instrumental variable (IV) regressions. Recently, many flexible machine learning methods have been developed for instrumental variable estimation. However, these methods have at least one of the following limitations: (1) restricting the IV regression to be uniquely identified; (2) only obtaining estimation error rates in terms of pseudometrics (… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: Under review

  47. arXiv:2302.03201  [pdf, ps, other

    cs.LG math.OC math.ST stat.ML

    Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR

    Authors: Kaiwen Wang, Nathan Kallus, Wen Sun

    Abstract: In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $τ$. Starting with multi-arm bandits (MABs), we show the minimax CVaR regret rate is $Ω(\sqrt{τ^{-1}AK})$, where $A$ is the number of actions and $K$ is the number of episodes, and that it is achieved by an Upper Confidence Bound algorithm with a nov… ▽ More

    Submitted 24 May, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted at ICML 2023

  48. arXiv:2302.02392  [pdf, ps, other

    cs.LG stat.ML

    Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage

    Authors: Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun

    Abstract: In offline reinforcement learning (RL) we have no opportunity to explore so we must make assumptions that the data is sufficient to guide picking a good policy, taking the form of assuming some coverage, realizability, Bellman completeness, and/or hard margin (gap). In this work we propose value-based algorithms for offline RL with PAC guarantees under just partial coverage, specifically, coverage… ▽ More

    Submitted 13 November, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

    Comments: The original title of this paper was "Refined Value-Based Offline RL under Realizability and Partial Coverage," but it was later changed. This paper has been accepted for NeurIPS 2023

  49. arXiv:2301.12366  [pdf, other

    cs.LG cs.AI math.OC math.ST

    Smooth Non-Stationary Bandits

    Authors: Su Jia, Qian Xie, Nathan Kallus, Peter I. Frazier

    Abstract: In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time. However, in practice, environments often change {\em smoothly}, so such algorithms may incur higher-tha… ▽ More

    Submitted 17 November, 2024; v1 submitted 29 January, 2023; originally announced January 2023.

    Comments: Accepted by ICML 2023

  50. arXiv:2212.14411  [pdf, other

    stat.ME econ.EM math.ST stat.ML

    Near-Optimal Non-Parametric Sequential Tests and Confidence Sequences with Possibly Dependent Observations

    Authors: Aurelien Bibaut, Nathan Kallus, Michael Lindon

    Abstract: Sequential tests and their implied confidence sequences, which are valid at arbitrary stopping times, promise flexible statistical inference and on-the-fly decision making. However, strong guarantees are limited to parametric sequential tests that under-cover in practice or concentration-bound-based sequences that over-cover and have suboptimal rejection times. In this work, we consider classic de… ▽ More

    Submitted 11 March, 2024; v1 submitted 29 December, 2022; originally announced December 2022.