-
Control Tax: The Price of Keeping AI in Check
Authors:
Mikhail Terekhov,
Zhen Ning David Liu,
Caglar Gulcehre,
Samuel Albanie
Abstract:
The rapid integration of agentic AI into high-stakes real-world applications requires robust oversight mechanisms. The emerging field of AI Control (AIC) aims to provide such an oversight mechanism, but practical adoption depends heavily on implementation overhead. To study this problem better, we introduce the notion of Control tax -- the operational and financial cost of integrating control meas…
▽ More
The rapid integration of agentic AI into high-stakes real-world applications requires robust oversight mechanisms. The emerging field of AI Control (AIC) aims to provide such an oversight mechanism, but practical adoption depends heavily on implementation overhead. To study this problem better, we introduce the notion of Control tax -- the operational and financial cost of integrating control measures into AI pipelines. Our work makes three key contributions to the field of AIC: (1) we introduce a theoretical framework that quantifies the Control Tax and maps classifier performance to safety assurances; (2) we conduct comprehensive evaluations of state-of-the-art language models in adversarial settings, where attacker models insert subtle backdoors into code while monitoring models attempt to detect these vulnerabilities; and (3) we provide empirical financial cost estimates for control protocols and develop optimized monitoring strategies that balance safety and cost-effectiveness while accounting for practical constraints like auditing budgets. Our framework enables practitioners to make informed decisions by systematically connecting safety guarantees with their costs, advancing AIC through principled economic feasibility assessment across different deployment contexts.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Physical knowledge improves prediction of EM Fields
Authors:
Andrzej Dulny,
Farzad Jabbarigargari,
Andreas Hotho,
Laura Maria Schreiber,
Maxim Terekhov,
Anna Krause
Abstract:
We propose a 3D U-Net model to predict the spatial distribution of electromagnetic fields inside a radio-frequency (RF) coil with a subject present, using the phase, amplitude, and position of the coils, along with the density, permittivity, and conductivity of the surrounding medium as inputs. To improve accuracy, we introduce a physics-augmented variant, U-Net Phys, which incorporates Gauss's la…
▽ More
We propose a 3D U-Net model to predict the spatial distribution of electromagnetic fields inside a radio-frequency (RF) coil with a subject present, using the phase, amplitude, and position of the coils, along with the density, permittivity, and conductivity of the surrounding medium as inputs. To improve accuracy, we introduce a physics-augmented variant, U-Net Phys, which incorporates Gauss's law of magnetism into the loss function using finite differences. We train our models on electromagnetic field simulations from CST Studio Suite for an eight-channel dipole array RF coil at 7T MRI. Experimental results show that U-Net Phys significantly outperforms the standard U-Net, particularly in predicting fields within the subject, demonstrating the advantage of integrating physical constraints into deep learning-based field prediction.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Regret-Optimized Portfolio Enhancement through Deep Reinforcement Learning and Future Looking Rewards
Authors:
Daniil Karzanov,
Rubén Garzón,
Mikhail Terekhov,
Caglar Gulcehre,
Thomas Raffinot,
Marcin Detyniecki
Abstract:
This paper introduces a novel agent-based approach for enhancing existing portfolio strategies using Proximal Policy Optimization (PPO). Rather than focusing solely on traditional portfolio construction, our approach aims to improve an already high-performing strategy through dynamic rebalancing driven by PPO and Oracle agents. Our target is to enhance the traditional 60/40 benchmark (60% stocks,…
▽ More
This paper introduces a novel agent-based approach for enhancing existing portfolio strategies using Proximal Policy Optimization (PPO). Rather than focusing solely on traditional portfolio construction, our approach aims to improve an already high-performing strategy through dynamic rebalancing driven by PPO and Oracle agents. Our target is to enhance the traditional 60/40 benchmark (60% stocks, 40% bonds) by employing the Regret-based Sharpe reward function. To address the impact of transaction fee frictions and prevent signal loss, we develop a transaction cost scheduler. We introduce a future-looking reward function and employ synthetic data training through a circular block bootstrap method to facilitate the learning of generalizable allocation strategies. We focus on two key evaluation measures: return and maximum drawdown. Given the high stochasticity of financial markets, we train 20 independent agents each period and evaluate their average performance against the benchmark. Our method not only enhances the performance of the existing portfolio strategy through strategic rebalancing but also demonstrates strong results compared to other baselines.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Contextual Bandit Optimization with Pre-Trained Neural Networks
Authors:
Mikhail Terekhov
Abstract:
Bandit optimization is a difficult problem, especially if the reward model is high-dimensional. When rewards are modeled by neural networks, sublinear regret has only been shown under strong assumptions, usually when the network is extremely wide. In this thesis, we investigate how pre-training can help us in the regime of smaller models. We consider a stochastic contextual bandit with the rewards…
▽ More
Bandit optimization is a difficult problem, especially if the reward model is high-dimensional. When rewards are modeled by neural networks, sublinear regret has only been shown under strong assumptions, usually when the network is extremely wide. In this thesis, we investigate how pre-training can help us in the regime of smaller models. We consider a stochastic contextual bandit with the rewards modeled by a multi-layer neural network. The last layer is a linear predictor, and the layers before it are a black box neural architecture, which we call a representation network. We model pre-training as an initial guess of the weights of the representation network provided to the learner. To leverage the pre-trained weights, we introduce a novel algorithm we call Explore Twice then Commit (E2TC). During its two stages of exploration, the algorithm first estimates the last layer's weights using Ridge regression, and then runs Stochastic Gradient Decent jointly on all the weights. For a locally convex loss function, we provide conditions on the pre-trained weights under which the algorithm can learn efficiently. Under these conditions, we show sublinear regret of E2TC when the dimension of the last layer and number of actions $K$ are much smaller than the horizon $T$. In the weak training regime, when only the last layer is learned, the problem reduces to a misspecified linear bandit. We introduce a measure of misspecification $ε_0$ for this bandit and use it to provide bounds $O(ε_0\sqrt{d}KT+(KT)^{4 /5})$ or $\tilde{O}(ε_0\sqrt{d}KT+d^{1 /3}(KT)^{2 /3})$ on the regret, depending on regularization strength. The first of these bounds has a dimension-independent sublinear term, made possible by the stochasticity of contexts. We also run experiments to evaluate the regret of E2TC and sample complexity of its exploration in practice.
△ Less
Submitted 9 January, 2025;
originally announced January 2025.
-
One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models
Authors:
Viacheslav Surkov,
Chris Wendler,
Antonio Mari,
Mikhail Terekhov,
Justin Deschenaux,
Robert West,
Caglar Gulcehre,
David Bau
Abstract:
For large language models (LLMs), sparse autoencoders (SAEs) have been shown to decompose intermediate representations that often are not interpretable directly into sparse sums of interpretable features, facilitating better control and subsequent analysis. However, similar analyses and approaches have been lacking for text-to-image models. We investigate the possibility of using SAEs to learn int…
▽ More
For large language models (LLMs), sparse autoencoders (SAEs) have been shown to decompose intermediate representations that often are not interpretable directly into sparse sums of interpretable features, facilitating better control and subsequent analysis. However, similar analyses and approaches have been lacking for text-to-image models. We investigate the possibility of using SAEs to learn interpretable features for SDXL Turbo, a few-step text-to-image diffusion model. To this end, we train SAEs on the updates performed by transformer blocks within SDXL Turbo's denoising U-net in its 1-step setting. Interestingly, we find that they generalize to 4-step SDXL Turbo and even to the multi-step SDXL base model (i.e., a different model) without additional training. In addition, we show that their learned features are interpretable, causally influence the generation process, and reveal specialization among the blocks. We do so by creating RIEBench, a representation-based image editing benchmark, for editing images while they are generated by turning on and off individual SAE features. This allows us to track which transformer blocks' features are the most impactful depending on the edit category. Our work is the first investigation of SAEs for interpretability in text-to-image diffusion models and our results establish SAEs as a promising approach for understanding and manipulating the internal mechanisms of text-to-image models.
△ Less
Submitted 30 May, 2025; v1 submitted 28 October, 2024;
originally announced October 2024.
-
In Search for Architectures and Loss Functions in Multi-Objective Reinforcement Learning
Authors:
Mikhail Terekhov,
Caglar Gulcehre
Abstract:
Multi-objective reinforcement learning (MORL) is essential for addressing the intricacies of real-world RL problems, which often require trade-offs between multiple utility functions. However, MORL is challenging due to unstable learning dynamics with deep learning-based function approximators. The research path most taken has been to explore different value-based loss functions for MORL to overco…
▽ More
Multi-objective reinforcement learning (MORL) is essential for addressing the intricacies of real-world RL problems, which often require trade-offs between multiple utility functions. However, MORL is challenging due to unstable learning dynamics with deep learning-based function approximators. The research path most taken has been to explore different value-based loss functions for MORL to overcome this issue. Our work empirically explores model-free policy learning loss functions and the impact of different architectural choices. We introduce two different approaches: Multi-objective Proximal Policy Optimization (MOPPO), which extends PPO to MORL, and Multi-objective Advantage Actor Critic (MOA2C), which acts as a simple baseline in our ablations. Our proposed approach is straightforward to implement, requiring only small modifications at the level of function approximator. We conduct comprehensive evaluations on the MORL Deep Sea Treasure, Minecart, and Reacher environments and show that MOPPO effectively captures the Pareto front. Our extensive ablation studies and empirical analyses reveal the impact of different architectural choices, underscoring the robustness and versatility of MOPPO compared to popular MORL approaches like Pareto Conditioned Networks (PCN) and Envelope Q-learning in terms of MORL metrics, including hypervolume and expected utility.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Learning pure quantum states (almost) without regret
Authors:
Josep Lumbreras,
Mikhail Terekhov,
Marco Tomamichel
Abstract:
We initiate the study of sample-optimal quantum state tomography with minimal disturbance to the samples. Can we efficiently learn a precise description of a quantum state through sequential measurements of samples while at the same time making sure that the post-measurement state of the samples is only minimally perturbed? Defining regret as the cumulative disturbance of all samples, the challeng…
▽ More
We initiate the study of sample-optimal quantum state tomography with minimal disturbance to the samples. Can we efficiently learn a precise description of a quantum state through sequential measurements of samples while at the same time making sure that the post-measurement state of the samples is only minimally perturbed? Defining regret as the cumulative disturbance of all samples, the challenge is to find a balance between the most informative sequence of measurements on the one hand and measurements incurring minimal regret on the other. Here we answer this question for qubit states by exhibiting a protocol that for pure states achieves maximal precision while incurring a regret that grows only polylogarithmically with the number of samples, a scaling that we show to be optimal.
△ Less
Submitted 5 June, 2025; v1 submitted 26 June, 2024;
originally announced June 2024.
-
Invariant systems of weighted representatives
Authors:
Anton A. Klyachko,
Mikhail S. Terekhov
Abstract:
It is known that, if removing some $n$ edges from a graph $Γ$ destroys all subgraphs isomorphic to a given finite graph $K$, then all subgraphs isomorphic to $K$ can be destroyed by removing at most $|E(K)|\cdot n$ edges, which form a set invariant with respect to all automorphisms of $Γ$. We construct the first examples of (connected) graphs $K$ for which this estimate is not sharp. Our arguments…
▽ More
It is known that, if removing some $n$ edges from a graph $Γ$ destroys all subgraphs isomorphic to a given finite graph $K$, then all subgraphs isomorphic to $K$ can be destroyed by removing at most $|E(K)|\cdot n$ edges, which form a set invariant with respect to all automorphisms of $Γ$. We construct the first examples of (connected) graphs $K$ for which this estimate is not sharp. Our arguments are based on a ``weighted analogue'' of an earlier known estimate for the cost of symmetry.
△ Less
Submitted 19 January, 2025; v1 submitted 20 June, 2023;
originally announced June 2023.
-
The cost of symmetry in connected graphs
Authors:
M. S. Terekhov
Abstract:
The paper answers the question posed in a joint paper by A. A. Klyachko and N. M. Luneva about the optimality of the estimate for the cost of symmetry in graphs. The original estimate says that if n vertices can be removed from a connected graph so that there is no connected subgraph of isomorphic $Γ$ left in it, then at most $n|V(Γ)|$ vertices that form a set invariant under all automorphisms of…
▽ More
The paper answers the question posed in a joint paper by A. A. Klyachko and N. M. Luneva about the optimality of the estimate for the cost of symmetry in graphs. The original estimate says that if n vertices can be removed from a connected graph so that there is no connected subgraph of isomorphic $Γ$ left in it, then at most $n|V(Γ)|$ vertices that form a set invariant under all automorphisms of the graph so that the graph does not contain a subgraph isomorphic to $Γ$. We will prove that there exists a graph $Γ$ for which this estimate is not optimal.
△ Less
Submitted 18 June, 2022; v1 submitted 19 February, 2022;
originally announced February 2022.
-
An adaptive numerical method for free surface flows passing rigidly mounted obstacles
Authors:
Kirill D. Nikitin,
Maxim A. Olshanskii,
Kirill M. Terekhov,
Yuri V. Vassilevski,
Ruslan Yanbarisov
Abstract:
The paper develops a method for the numerical simulation of a free-surface flow of incompressible viscous fluid around a streamlined body. The body is a rigid stationary construction partially submerged in the fluid. The application we are interested in the paper is a flow around a surface mounted offshore oil platform. The numerical method builds on a hybrid finite volume / finite difference disc…
▽ More
The paper develops a method for the numerical simulation of a free-surface flow of incompressible viscous fluid around a streamlined body. The body is a rigid stationary construction partially submerged in the fluid. The application we are interested in the paper is a flow around a surface mounted offshore oil platform. The numerical method builds on a hybrid finite volume / finite difference discretization using adaptive octree cubic meshes. The mesh is dynamically refined towards the free surface and the construction. Special care is taken to devise a discretization for the case of curvilinear boundaries and interfaces immersed in the octree Cartesian background computational mesh. To demonstrate the accuracy of the method, we show the results for two benchmark problems: the sloshing 3D container and the channel laminar flow passing the 3D cylinder of circular cross-section. Further, we simulate numerically a flow with surface waves around an offshore oil platform for the realistic set of geophysical data.
△ Less
Submitted 9 February, 2017; v1 submitted 18 September, 2016;
originally announced September 2016.
-
Ultrasensitive 3He magnetometer for measurements of high magnetic fields
Authors:
A. Nikiel,
P. Blümler,
W. Heil,
M. Hehn,
S. Karpuk,
A. Maul,
E. Otten,
L. M. Schreiber,
M. Terekhov
Abstract:
We describe a 3He magnetometer capable to measure high magnetic fields (B > 0.1 Tesla) with a relative accuracy of better than 10^-12. Our approach is based on the measurement of the free induction decay of gaseous, nuclear spin polarized 3He following a resonant radio frequency pulse excitation. The measurement sensitivity can be attributed to the long coherent spin precession time T2* being of o…
▽ More
We describe a 3He magnetometer capable to measure high magnetic fields (B > 0.1 Tesla) with a relative accuracy of better than 10^-12. Our approach is based on the measurement of the free induction decay of gaseous, nuclear spin polarized 3He following a resonant radio frequency pulse excitation. The measurement sensitivity can be attributed to the long coherent spin precession time T2* being of order minutes which is achieved for spherical sample cells in the regime of motional narrowing where the disturbing influence of field inhomogeneities is strongly suppressed. The 3He gas is spin polarized in-situ using a new, non-standard variant of the metastability exchange optical pumping. We show that miniaturization helps to increase T2* further and that the measurement sensitivity is not significantly affected by temporal field fluctuations of order 10^-4.
△ Less
Submitted 27 May, 2014;
originally announced May 2014.
-
3rd Interplanetary Network Localization, Time History, Fluence, Peak Flux, and Distance Lower Limit of the February 28, 1997 Gamma-Ray Burst
Authors:
K. Hurley,
E. Costa,
M. Feroci,
F. Frontera,
T. Cline,
D. Dal Fiume,
M. Orlandini,
M. Boer,
E. Mazets,
R. Aptekar,
S. Golenetskii,
M. Terekhov
Abstract:
The gamma-ray burst of 1997 February 28 was localized using the arrival-time analysis method with the Ulysses, BeppoSAX, and WIND spacecraft. The result is a plus-or-minus 31.5 arcsec (3 sigma) wide annulus of possible arrival directions which intersects both the position of the burst determined independently by the SAX Wide Field Camera, and the position of a fading X-ray source detected by the…
▽ More
The gamma-ray burst of 1997 February 28 was localized using the arrival-time analysis method with the Ulysses, BeppoSAX, and WIND spacecraft. The result is a plus-or-minus 31.5 arcsec (3 sigma) wide annulus of possible arrival directions which intersects both the position of the burst determined independently by the SAX Wide Field Camera, and the position of a fading X-ray source detected by the SAX focussing X-ray telescopes, and reduces these source location areas by factors of 7 and 1.5 respectively. The combination of the annulus and the SAX locations, a 0.76 square arcminute error box, is consistent with that of an optical transient source and an extended object, possibly a galaxy. We also present the time history, peak flux, and fluence of this event, and derive a model-independent lower limit to the source distance of ~11000 AU.
△ Less
Submitted 16 May, 1997;
originally announced May 1997.