-
Accelerating Nash Learning from Human Feedback via Mirror Prox
Authors:
Daniil Tiapkin,
Daniele Calandriello,
Denis Belomestny,
Eric Moulines,
Alexey Naumov,
Kashif Rasul,
Michal Valko,
Pierre Menard
Abstract:
Traditional Reinforcement Learning from Human Feedback (RLHF) often relies on reward models, frequently assuming preference structures like the Bradley-Terry model, which may not accurately capture the complexities of real human preferences (e.g., intransitivity). Nash Learning from Human Feedback (NLHF) offers a more direct alternative by framing the problem as finding a Nash equilibrium of a gam…
▽ More
Traditional Reinforcement Learning from Human Feedback (RLHF) often relies on reward models, frequently assuming preference structures like the Bradley-Terry model, which may not accurately capture the complexities of real human preferences (e.g., intransitivity). Nash Learning from Human Feedback (NLHF) offers a more direct alternative by framing the problem as finding a Nash equilibrium of a game defined by these preferences. In this work, we introduce Nash Mirror Prox ($\mathtt{Nash-MP}$), an online NLHF algorithm that leverages the Mirror Prox optimization scheme to achieve fast and stable convergence to the Nash equilibrium. Our theoretical analysis establishes that Nash-MP exhibits last-iterate linear convergence towards the $β$-regularized Nash equilibrium. Specifically, we prove that the KL-divergence to the optimal policy decreases at a rate of order $(1+2β)^{-N/2}$, where $N$ is a number of preference queries. We further demonstrate last-iterate linear convergence for the exploitability gap and uniformly for the span semi-norm of log-probabilities, with all these rates being independent of the size of the action space. Furthermore, we propose and analyze an approximate version of Nash-MP where proximal steps are estimated using stochastic policy gradients, making the algorithm closer to applications. Finally, we detail a practical implementation strategy for fine-tuning large language models and present experiments that demonstrate its competitive performance and compatibility with existing methods.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Recurrent Interpolants for Probabilistic Time Series Prediction
Authors:
Yu Chen,
Marin Biloš,
Sarthak Mittal,
Wei Deng,
Kashif Rasul,
Anderson Schneider
Abstract:
Sequential models like recurrent neural networks and transformers have become standard for probabilistic multivariate time series forecasting across various domains. Despite their strengths, they struggle with capturing high-dimensional distributions and cross-feature dependencies. Recent work explores generative approaches using diffusion or flow-based models, extending to time series imputation…
▽ More
Sequential models like recurrent neural networks and transformers have become standard for probabilistic multivariate time series forecasting across various domains. Despite their strengths, they struggle with capturing high-dimensional distributions and cross-feature dependencies. Recent work explores generative approaches using diffusion or flow-based models, extending to time series imputation and forecasting. However, scalability remains a challenge. This work proposes a novel method combining recurrent neural networks' efficiency with diffusion models' probabilistic modeling, based on stochastic interpolants and conditional generation with control features, offering insights for future developments in this dynamic field.
△ Less
Submitted 4 October, 2024; v1 submitted 17 September, 2024;
originally announced September 2024.
-
Forecasting with Hyper-Trees
Authors:
Alexander März,
Kashif Rasul
Abstract:
We introduce the concept of Hyper-Trees and offer a new direction in applying tree-based models to time series data. Unlike conventional applications of decision trees that forecast time series directly, Hyper-Trees are designed to learn the parameters of time series models. Our framework combines the effectiveness of gradient boosted trees on tabular data with the advantages of established time s…
▽ More
We introduce the concept of Hyper-Trees and offer a new direction in applying tree-based models to time series data. Unlike conventional applications of decision trees that forecast time series directly, Hyper-Trees are designed to learn the parameters of time series models. Our framework combines the effectiveness of gradient boosted trees on tabular data with the advantages of established time series models, thereby naturally inducing a time series inductive bias to tree models. By relating the parameters of a target time series model to features, Hyper-Trees also address the issue of parameter non-stationarity. To resolve the inherent scaling issue of boosted trees when estimating a large number of target model parameters, we combine decision trees and neural networks within a unified framework. In this novel approach, the trees first generate informative representations from the input features, which a shallow network then maps to the target model parameters. With our research, we aim to explore the effectiveness of Hyper-Trees across various forecasting scenarios and to extend the application of gradient boosted trees outside their conventional use in time series modeling.
△ Less
Submitted 14 October, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
Intrinsic Anomaly Detection for Multi-Variate Time Series
Authors:
Stephan Rabanser,
Tim Januschowski,
Kashif Rasul,
Oliver Borchert,
Richard Kurle,
Jan Gasthaus,
Michael Bohlke-Schneider,
Nicolas Papernot,
Valentin Flunkert
Abstract:
We introduce a novel, practically relevant variation of the anomaly detection problem in multi-variate time series: intrinsic anomaly detection. It appears in diverse practical scenarios ranging from DevOps to IoT, where we want to recognize failures of a system that operates under the influence of a surrounding environment. Intrinsic anomalies are changes in the functional dependency structure be…
▽ More
We introduce a novel, practically relevant variation of the anomaly detection problem in multi-variate time series: intrinsic anomaly detection. It appears in diverse practical scenarios ranging from DevOps to IoT, where we want to recognize failures of a system that operates under the influence of a surrounding environment. Intrinsic anomalies are changes in the functional dependency structure between time series that represent an environment and time series that represent the internal state of a system that is placed in said environment. We formalize this problem, provide under-studied public and new purpose-built data sets for it, and present methods that handle intrinsic anomaly detection. These address the short-coming of existing anomaly detection methods that cannot differentiate between expected changes in the system's state and unexpected ones, i.e., changes in the system that deviate from the environment's influence. Our most promising approach is fully unsupervised and combines adversarial learning and time series representation learning, thereby addressing problems such as label sparsity and subjectivity, while allowing to navigate and improve notoriously problematic anomaly detection data sets.
△ Less
Submitted 28 June, 2022;
originally announced June 2022.
-
Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows
Authors:
Kashif Rasul,
Abdul-Saboor Sheikh,
Ingmar Schuster,
Urs Bergmann,
Roland Vollgraf
Abstract:
Time series forecasting is often fundamental to scientific and engineering problems and enables decision making. With ever increasing data set sizes, a trivial solution to scale up predictions is to assume independence between interacting time series. However, modeling statistical dependencies can improve accuracy and enable analysis of interaction effects. Deep learning methods are well suited fo…
▽ More
Time series forecasting is often fundamental to scientific and engineering problems and enables decision making. With ever increasing data set sizes, a trivial solution to scale up predictions is to assume independence between interacting time series. However, modeling statistical dependencies can improve accuracy and enable analysis of interaction effects. Deep learning methods are well suited for this problem, but multivariate models often assume a simple parametric distribution and do not scale to high dimensions. In this work we model the multivariate temporal dynamics of time series via an autoregressive deep learning model, where the data distribution is represented by a conditioned normalizing flow. This combination retains the power of autoregressive models, such as good performance in extrapolation into the future, with the flexibility of flows as a general purpose high-dimensional distribution model, while remaining computationally tractable. We show that it improves over the state-of-the-art for standard metrics on many real-world data sets with several thousand interacting time-series.
△ Less
Submitted 14 January, 2021; v1 submitted 14 February, 2020;
originally announced February 2020.
-
Set Flow: A Permutation Invariant Normalizing Flow
Authors:
Kashif Rasul,
Ingmar Schuster,
Roland Vollgraf,
Urs Bergmann
Abstract:
We present a generative model that is defined on finite sets of exchangeable, potentially high dimensional, data. As the architecture is an extension of RealNVPs, it inherits all its favorable properties, such as being invertible and allowing for exact log-likelihood evaluation. We show that this architecture is able to learn finite non-i.i.d. set data distributions, learn statistical dependencies…
▽ More
We present a generative model that is defined on finite sets of exchangeable, potentially high dimensional, data. As the architecture is an extension of RealNVPs, it inherits all its favorable properties, such as being invertible and allowing for exact log-likelihood evaluation. We show that this architecture is able to learn finite non-i.i.d. set data distributions, learn statistical dependencies between entities of the set and is able to train and sample with variable set sizes in a computationally efficient manner. Experiments on 3D point clouds show state-of-the art likelihoods.
△ Less
Submitted 6 September, 2019;
originally announced September 2019.
-
A Bandit Framework for Optimal Selection of Reinforcement Learning Agents
Authors:
Andreas Merentitis,
Kashif Rasul,
Roland Vollgraf,
Abdul-Saboor Sheikh,
Urs Bergmann
Abstract:
Deep Reinforcement Learning has been shown to be very successful in complex games, e.g. Atari or Go. These games have clearly defined rules, and hence allow simulation. In many practical applications, however, interactions with the environment are costly and a good simulator of the environment is not available. Further, as environments differ by application, the optimal inductive bias (architectur…
▽ More
Deep Reinforcement Learning has been shown to be very successful in complex games, e.g. Atari or Go. These games have clearly defined rules, and hence allow simulation. In many practical applications, however, interactions with the environment are costly and a good simulator of the environment is not available. Further, as environments differ by application, the optimal inductive bias (architecture, hyperparameters, etc.) of a reinforcement agent depends on the application. In this work, we propose a multi-arm bandit framework that selects from a set of different reinforcement learning agents to choose the one with the best inductive bias. To alleviate the problem of sparse rewards, the reinforcement learning agents are augmented with surrogate rewards. This helps the bandit framework to select the best agents early, since these rewards are smoother and less sparse than the environment reward. The bandit has the double objective of maximizing the reward while the agents are learning and selecting the best agent after a finite number of learning steps. Our experimental results on standard environments show that the proposed framework is able to consistently select the optimal agent after a finite number of steps, while collecting more cumulative reward compared to selecting a sub-optimal architecture or uniformly alternating between different agents.
△ Less
Submitted 10 February, 2019;
originally announced February 2019.
-
Stochastic Maximum Likelihood Optimization via Hypernetworks
Authors:
Abdul-Saboor Sheikh,
Kashif Rasul,
Andreas Merentitis,
Urs Bergmann
Abstract:
This work explores maximum likelihood optimization of neural networks through hypernetworks. A hypernetwork initializes the weights of another network, which in turn can be employed for typical functional tasks such as regression and classification. We optimize hypernetworks to directly maximize the conditional likelihood of target variables given input. Using this approach we obtain competitive e…
▽ More
This work explores maximum likelihood optimization of neural networks through hypernetworks. A hypernetwork initializes the weights of another network, which in turn can be employed for typical functional tasks such as regression and classification. We optimize hypernetworks to directly maximize the conditional likelihood of target variables given input. Using this approach we obtain competitive empirical results on regression and classification benchmarks.
△ Less
Submitted 12 January, 2018; v1 submitted 4 December, 2017;
originally announced December 2017.
-
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Authors:
Han Xiao,
Kashif Rasul,
Roland Vollgraf
Abstract:
We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image s…
▽ More
We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits. The dataset is freely available at https://github.com/zalandoresearch/fashion-mnist
△ Less
Submitted 15 September, 2017; v1 submitted 25 August, 2017;
originally announced August 2017.