Search | arXiv e-print repository

Long-Context Linear System Identification

Authors: Oğuz Kaan Yüksel, Mathieu Even, Nicolas Flammarion

Abstract: This paper addresses the problem of long-context linear system identification, where the state $x_t$ of a dynamical system at time $t$ depends linearly on previous states $x_s$ over a fixed context window of length $p$. We establish a sample complexity bound that matches the i.i.d. parametric rate up to logarithmic factors for a broad class of systems, extending previous works that considered only… ▽ More This paper addresses the problem of long-context linear system identification, where the state $x_t$ of a dynamical system at time $t$ depends linearly on previous states $x_s$ over a fixed context window of length $p$. We establish a sample complexity bound that matches the i.i.d. parametric rate up to logarithmic factors for a broad class of systems, extending previous works that considered only first-order dependencies. Our findings reveal a learning-without-mixing phenomenon, indicating that learning long-context linear autoregressive models is not hindered by slow mixing properties potentially associated with extended context windows. Additionally, we extend these results to (i) shared low-rank representations, where rank-regularized estimators improve the dependence of the rates on the dimensionality, and (ii) misspecified context lengths in strictly stable systems, where shorter contexts offer statistical advantages. △ Less

Submitted 2 July, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

Comments: Published at ICLR 2025. This version includes minor corrections and improved grammar from the published version. 34 pages, 4 figures

Journal ref: O. K. Yüksel, M. Even, N. Flammarion, "Long-Context Linear System Identification", in The Thirteenth International Conference on Learning Representations, 2025

arXiv:2303.01335 [pdf, other]

First-order ANIL provably learns representations despite overparametrization

Authors: Oğuz Kaan Yüksel, Etienne Boursier, Nicolas Flammarion

Abstract: Due to its empirical success in few-shot classification and reinforcement learning, meta-learning has recently received significant interest. Meta-learning methods leverage data from previous tasks to learn a new task in a sample-efficient manner. In particular, model-agnostic methods look for initialization points from which gradient descent quickly adapts to any new task. Although it has been em… ▽ More Due to its empirical success in few-shot classification and reinforcement learning, meta-learning has recently received significant interest. Meta-learning methods leverage data from previous tasks to learn a new task in a sample-efficient manner. In particular, model-agnostic methods look for initialization points from which gradient descent quickly adapts to any new task. Although it has been empirically suggested that such methods perform well by learning shared representations during pretraining, there is limited theoretical evidence of such behavior. More importantly, it has not been shown that these methods still learn a shared structure, despite architectural misspecifications. In this direction, this work shows, in the limit of an infinite number of tasks, that first-order ANIL with a linear two-layer network architecture successfully learns linear shared representations. This result even holds with overparametrization; having a width larger than the dimension of the shared representations results in an asymptotically low-rank solution. The learned solution then yields a good adaptation performance on any new task after a single gradient step. Overall, this illustrates how well model-agnostic methods such as first-order ANIL can learn shared representations. △ Less

Submitted 23 July, 2024; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: 42 pages, 17 figures

Journal ref: O. K. Yüksel, E. Boursier and N. Flammarion, "First-order ANIL provably learns representations despite overparametrisation", in The Twelfth International Conference on Learning Representations, 2024

arXiv:2108.07958 [pdf, other]

Semantic Perturbations with Normalizing Flows for Improved Generalization

Authors: Oguz Kaan Yuksel, Sebastian U. Stich, Martin Jaggi, Tatjana Chavdarova

Abstract: Data augmentation is a widely adopted technique for avoiding overfitting when training deep neural networks. However, this approach requires domain-specific knowledge and is often limited to a fixed set of hard-coded transformations. Recently, several works proposed to use generative models for generating semantically meaningful perturbations to train a classifier. However, because accurate encodi… ▽ More Data augmentation is a widely adopted technique for avoiding overfitting when training deep neural networks. However, this approach requires domain-specific knowledge and is often limited to a fixed set of hard-coded transformations. Recently, several works proposed to use generative models for generating semantically meaningful perturbations to train a classifier. However, because accurate encoding and decoding are critical, these methods, which use architectures that approximate the latent-variable inference, remained limited to pilot studies on small datasets. Exploiting the exactly reversible encoder-decoder structure of normalizing flows, we perform on-manifold perturbations in the latent space to define fully unsupervised data augmentations. We demonstrate that such perturbations match the performance of advanced data augmentation techniques -- reaching 96.6% test accuracy for CIFAR-10 using ResNet-18 and outperform existing methods, particularly in low data regimes -- yielding 10--25% relative improvement of test accuracy from classical training. We find that our latent adversarial perturbations adaptive to the classifier throughout its training are most effective, yielding the first test accuracy improvement results on real-world datasets -- CIFAR-10/100 -- via latent-space perturbations. △ Less

Submitted 17 August, 2021; originally announced August 2021.

Comments: In Proceedings of the IEEE International Conference on Computer Vision

Showing 1–3 of 3 results for author: Yüksel, O K