-
Demystifying Spectral Feature Learning for Instrumental Variable Regression
Authors:
Dimitri Meunier,
Antoine Moulin,
Jakub Wornbard,
Vladimir R. Kostic,
Arthur Gretton
Abstract:
We address the problem of causal effect estimation in the presence of hidden confounders, using nonparametric instrumental variable (IV) regression. A leading strategy employs spectral features - that is, learned features spanning the top eigensubspaces of the operator linking treatments to instruments. We derive a generalization error bound for a two-stage least squares estimator based on spectra…
▽ More
We address the problem of causal effect estimation in the presence of hidden confounders, using nonparametric instrumental variable (IV) regression. A leading strategy employs spectral features - that is, learned features spanning the top eigensubspaces of the operator linking treatments to instruments. We derive a generalization error bound for a two-stage least squares estimator based on spectral features, and gain insights into the method's performance and failure modes. We show that performance depends on two key factors, leading to a clear taxonomy of outcomes. In a good scenario, the approach is optimal. This occurs with strong spectral alignment, meaning the structural function is well-represented by the top eigenfunctions of the conditional operator, coupled with this operator's slow eigenvalue decay, indicating a strong instrument. Performance degrades in a bad scenario: spectral alignment remains strong, but rapid eigenvalue decay (indicating a weaker instrument) demands significantly more samples for effective feature learning. Finally, in the ugly scenario, weak spectral alignment causes the method to fail, regardless of the eigenvalues' characteristics. Our synthetic experiments empirically validate this taxonomy.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
When Lower-Order Terms Dominate: Adaptive Expert Algorithms for Heavy-Tailed Losses
Authors:
Antoine Moulin,
Emmanuel Esposito,
Dirk van der Hoeven
Abstract:
We consider the problem setting of prediction with expert advice with possibly heavy-tailed losses, i.e.\ the only assumption on the losses is an upper bound on their second moments, denoted by $θ$. We develop adaptive algorithms that do not require any prior knowledge about the range or the second moment of the losses. Existing adaptive algorithms have what is typically considered a lower-order t…
▽ More
We consider the problem setting of prediction with expert advice with possibly heavy-tailed losses, i.e.\ the only assumption on the losses is an upper bound on their second moments, denoted by $θ$. We develop adaptive algorithms that do not require any prior knowledge about the range or the second moment of the losses. Existing adaptive algorithms have what is typically considered a lower-order term in their regret guarantees. We show that this lower-order term, which is often the maximum of the losses, can actually dominate the regret bound in our setting. Specifically, we show that even with small constant $θ$, this lower-order term can scale as $\sqrt{KT}$, where $K$ is the number of experts and $T$ is the time horizon. We propose adaptive algorithms with improved regret bounds that avoid the dependence on such a lower-order term and guarantee $\mathcal{O}(\sqrt{θT\log(K)})$ regret in the worst case, and $\mathcal{O}(θ\log(KT)/Δ_{\min})$ regret when the losses are sampled i.i.d.\ from some fixed distribution, where $Δ_{\min}$ is the difference between the mean losses of the second best expert and the best expert. Additionally, when the loss function is the squared loss, our algorithm also guarantees improved regret bounds over prior results.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
Inverse Q-Learning Done Right: Offline Imitation Learning in $Q^π$-Realizable MDPs
Authors:
Antoine Moulin,
Gergely Neu,
Luca Viano
Abstract:
We study the problem of offline imitation learning in Markov decision processes (MDPs), where the goal is to learn a well-performing policy given a dataset of state-action pairs generated by an expert policy. Complementing a recent line of work on this topic that assumes the expert belongs to a tractable class of known policies, we approach this problem from a new angle and leverage a different ty…
▽ More
We study the problem of offline imitation learning in Markov decision processes (MDPs), where the goal is to learn a well-performing policy given a dataset of state-action pairs generated by an expert policy. Complementing a recent line of work on this topic that assumes the expert belongs to a tractable class of known policies, we approach this problem from a new angle and leverage a different type of structural assumption about the environment. Specifically, for the class of linear $Q^π$-realizable MDPs, we introduce a new algorithm called saddle-point offline imitation learning (\SPOIL), which is guaranteed to match the performance of any expert up to an additive error $\varepsilon$ with access to $\mathcal{O}(\varepsilon^{-2})$ samples. Moreover, we extend this result to possibly non-linear $Q^π$-realizable MDPs at the cost of a worse sample complexity of order $\mathcal{O}(\varepsilon^{-4})$. Finally, our analysis suggests a new loss function for training critic networks from expert data in deep imitation learning. Empirical evaluations on standard benchmarks demonstrate that the neural net implementation of \SPOIL is superior to behavior cloning and competitive with state-of-the-art algorithms.
△ Less
Submitted 2 June, 2025; v1 submitted 26 May, 2025;
originally announced May 2025.
-
Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning
Authors:
Antoine Moulin,
Gergely Neu,
Luca Viano
Abstract:
We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving near-optimal regret guarantees in this setting. Our main idea is to combine two classic techniques for optimistic exploration: additive exploration bonuses applied to the reward function, and artificial transition…
▽ More
We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving near-optimal regret guarantees in this setting. Our main idea is to combine two classic techniques for optimistic exploration: additive exploration bonuses applied to the reward function, and artificial transitions made to an absorbing state with maximal return. We show that, combined with a regularized approximate dynamic-programming scheme, the resulting algorithm achieves a regret of order $\tilde{\mathcal{O}} (\sqrt{d^3 (1 - γ)^{- 7 / 2} T})$, where $T$ is the total number of sample transitions, $γ\in (0,1)$ is the discount factor, and $d$ is the feature dimensionality. The results continue to hold against adversarial reward sequences, enabling application of our method to the problem of imitation learning in linear MDPs, where we achieve state-of-the-art results.
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
Humanity's Last Exam
Authors:
Long Phan,
Alice Gatti,
Ziwen Han,
Nathaniel Li,
Josephina Hu,
Hugh Zhang,
Chen Bo Calvin Zhang,
Mohamed Shaaban,
John Ling,
Sean Shi,
Michael Choi,
Anish Agrawal,
Arnav Chopra,
Adam Khoja,
Ryan Kim,
Richard Ren,
Jason Hausenloy,
Oliver Zhang,
Mantas Mazeika,
Dmitry Dodonov,
Tung Nguyen,
Jaeho Lee,
Daron Anderson,
Mikhail Doroshenko,
Alun Cennyth Stokes
, et al. (1084 additional authors not shown)
Abstract:
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of…
▽ More
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
△ Less
Submitted 19 April, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
Spectral Representation for Causal Estimation with Hidden Confounders
Authors:
Haotian Sun,
Antoine Moulin,
Tongzheng Ren,
Arthur Gretton,
Bo Dai
Abstract:
We address the problem of causal effect estimation where hidden confounders are present, with a focus on two settings: instrumental variable regression with additional observed confounders, and proxy causal learning. Our approach uses a singular value decomposition of a conditional expectation operator, followed by a saddle-point optimization problem, which, in the context of IV regression, can be…
▽ More
We address the problem of causal effect estimation where hidden confounders are present, with a focus on two settings: instrumental variable regression with additional observed confounders, and proxy causal learning. Our approach uses a singular value decomposition of a conditional expectation operator, followed by a saddle-point optimization problem, which, in the context of IV regression, can be thought of as a neural net generalization of the seminal approach due to Darolles et al. [2011]. Saddle-point formulations have gathered considerable attention recently, as they can avoid double sampling bias and are amenable to modern function approximation methods. We provide experimental validation in various settings, and show that our approach outperforms existing methods on common benchmarks.
△ Less
Submitted 10 March, 2025; v1 submitted 15 July, 2024;
originally announced July 2024.
-
Optimistic Planning by Regularized Dynamic Programming
Authors:
Antoine Moulin,
Gergely Neu
Abstract:
We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments typically required by existing analyses of approximate dynamic programming methods, and in particula…
▽ More
We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments typically required by existing analyses of approximate dynamic programming methods, and in particular to use approximate transition functions estimated via least-squares procedures in MDPs with linear function approximation. We use our method to recover known guarantees in tabular MDPs and to provide a computationally efficient algorithm for learning near-optimal policies in discounted linear mixture MDPs from a single stream of experience, and show it achieves near-optimal statistical guarantees.
△ Less
Submitted 14 June, 2023; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Learning to simulate realistic limit order book markets from data as a World Agent
Authors:
Andrea Coletta,
Aymeric Moulin,
Svitlana Vyetrenko,
Tucker Balch
Abstract:
Multi-agent market simulators usually require careful calibration to emulate real markets, which includes the number and the type of agents. Poorly calibrated simulators can lead to misleading conclusions, potentially causing severe loss when employed by investment banks, hedge funds, and traders to study and evaluate trading strategies. In this paper, we propose a world model simulator that accur…
▽ More
Multi-agent market simulators usually require careful calibration to emulate real markets, which includes the number and the type of agents. Poorly calibrated simulators can lead to misleading conclusions, potentially causing severe loss when employed by investment banks, hedge funds, and traders to study and evaluate trading strategies. In this paper, we propose a world model simulator that accurately emulates a limit order book market -- it requires no agent calibration but rather learns the simulated market behavior directly from historical data. Traditional approaches fail short to learn and calibrate trader population, as historical labeled data with details on each individual trader strategy is not publicly available. Our approach proposes to learn a unique "world" agent from historical data. It is intended to emulate the overall trader population, without the need of making assumptions about individual market agent strategies. We implement our world agent simulator models as a Conditional Generative Adversarial Network (CGAN), as well as a mixture of parametric distributions, and we compare our models against previous work. Qualitatively and quantitatively, we show that the proposed approaches consistently outperform previous work, providing more realism and responsiveness.
△ Less
Submitted 26 September, 2022;
originally announced October 2022.
-
CTMSTOU driven markets: simulated environment for regime-awareness in trading policies
Authors:
Selim Amrouni,
Aymeric Moulin,
Tucker Balch
Abstract:
Market regimes is a popular topic in quantitative finance even though there is little consensus on the details of how they should be defined. They arise as a feature both in financial market prediction problems and financial market task performing problems.
In this work we use discrete event time multi-agent market simulation to freely experiment in a reproducible and understandable environment…
▽ More
Market regimes is a popular topic in quantitative finance even though there is little consensus on the details of how they should be defined. They arise as a feature both in financial market prediction problems and financial market task performing problems.
In this work we use discrete event time multi-agent market simulation to freely experiment in a reproducible and understandable environment where regimes can be explicitly switched and enforced.
We introduce a novel stochastic process to model the fundamental value perceived by market participants: Continuous-Time Markov Switching Trending Ornstein-Uhlenbeck (CTMSTOU), which facilitates the study of trading policies in regime switching markets.
We define the notion of regime-awareness for a trading agent as well and illustrate its importance through the study of different order placement strategies in the context of order execution problems.
△ Less
Submitted 3 February, 2022; v1 submitted 2 February, 2022;
originally announced February 2022.
-
ABIDES-Gym: Gym Environments for Multi-Agent Discrete Event Simulation and Application to Financial Markets
Authors:
Selim Amrouni,
Aymeric Moulin,
Jared Vann,
Svitlana Vyetrenko,
Tucker Balch,
Manuela Veloso
Abstract:
Model-free Reinforcement Learning (RL) requires the ability to sample trajectories by taking actions in the original problem environment or a simulated version of it. Breakthroughs in the field of RL have been largely facilitated by the development of dedicated open source simulators with easy to use frameworks such as OpenAI Gym and its Atari environments. In this paper we propose to use the Open…
▽ More
Model-free Reinforcement Learning (RL) requires the ability to sample trajectories by taking actions in the original problem environment or a simulated version of it. Breakthroughs in the field of RL have been largely facilitated by the development of dedicated open source simulators with easy to use frameworks such as OpenAI Gym and its Atari environments. In this paper we propose to use the OpenAI Gym framework on discrete event time based Discrete Event Multi-Agent Simulation (DEMAS). We introduce a general technique to wrap a DEMAS simulator into the Gym framework. We expose the technique in detail and implement it using the simulator ABIDES as a base. We apply this work by specifically using the markets extension of ABIDES, ABIDES-Markets, and develop two benchmark financial markets OpenAI Gym environments for training daily investor and execution agents. As a result, these two environments describe classic financial problems with a complex interactive market behavior response to the experimental agent's action.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Towards Realistic Market Simulations: a Generative Adversarial Networks Approach
Authors:
Andrea Coletta,
Matteo Prata,
Michele Conti,
Emanuele Mercanti,
Novella Bartolini,
Aymeric Moulin,
Svitlana Vyetrenko,
Tucker Balch
Abstract:
Simulated environments are increasingly used by trading firms and investment banks to evaluate trading strategies before approaching real markets. Backtesting, a widely used approach, consists of simulating experimental strategies while replaying historical market scenarios. Unfortunately, this approach does not capture the market response to the experimental agents' actions. In contrast, multi-ag…
▽ More
Simulated environments are increasingly used by trading firms and investment banks to evaluate trading strategies before approaching real markets. Backtesting, a widely used approach, consists of simulating experimental strategies while replaying historical market scenarios. Unfortunately, this approach does not capture the market response to the experimental agents' actions. In contrast, multi-agent simulation presents a natural bottom-up approach to emulating agent interaction in financial markets. It allows to set up pools of traders with diverse strategies to mimic the financial market trader population, and test the performance of new experimental strategies. Since individual agent-level historical data is typically proprietary and not available for public use, it is difficult to calibrate multiple market agents to obtain the realism required for testing trading strategies. To addresses this challenge we propose a synthetic market generator based on Conditional Generative Adversarial Networks (CGANs) trained on real aggregate-level historical data. A CGAN-based "world" agent can generate meaningful orders in response to an experimental agent. We integrate our synthetic market generator into ABIDES, an open source simulator of financial markets. By means of extensive simulations we show that our proposal outperforms previous work in terms of stylized facts reflecting market responsiveness and realism.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.