Rademacher learning rates for iterated random functions
Authors:
Nikola Sandrić
Abstract:
Most existing literature on supervised machine learning assumes that the training dataset is drawn from an i.i.d. sample. However, many real-world problems exhibit temporal dependence and strong correlations between the marginal distributions of the data-generating process, suggesting that the i.i.d. assumption is often unrealistic. In such cases, models naturally include time-series processes wit…
▽ More
Most existing literature on supervised machine learning assumes that the training dataset is drawn from an i.i.d. sample. However, many real-world problems exhibit temporal dependence and strong correlations between the marginal distributions of the data-generating process, suggesting that the i.i.d. assumption is often unrealistic. In such cases, models naturally include time-series processes with mixing properties, as well as irreducible and aperiodic ergodic Markov chains. Moreover, the learning rates typically obtained in these settings are independent of the data distribution, which can lead to restrictive choices of hypothesis classes and suboptimal sample complexities for the learning algorithm. In this article, we consider the case where the training dataset is generated by an iterated random function (i.e., an iteratively defined time-homogeneous Markov chain) that is not necessarily irreducible or aperiodic. Under the assumption that the governing function is contractive with respect to its first argument and subject to certain regularity conditions on the hypothesis class, we first establish a uniform convergence result for the corresponding sample error. We then demonstrate the learnability of the approximate empirical risk minimization algorithm and derive its learning rate bound. Both rates are data-distribution dependent, expressed in terms of the Rademacher complexities of the underlying hypothesis class, allowing them to more accurately reflect the properties of the data-generating distribution.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
Learning from non-irreducible Markov chains
Authors:
Nikola Sandrić,
Stjepan Šebek
Abstract:
Mostof the existing literature on supervised machine learning problems focuses on the case when the training data set is drawn from an i.i.d. sample. However, many practical problems are characterized by temporal dependence and strong correlation between the marginals of the data-generating process, suggesting that the i.i.d. assumption is not always justified. This problem has been already consid…
▽ More
Mostof the existing literature on supervised machine learning problems focuses on the case when the training data set is drawn from an i.i.d. sample. However, many practical problems are characterized by temporal dependence and strong correlation between the marginals of the data-generating process, suggesting that the i.i.d. assumption is not always justified. This problem has been already considered in the context of Markov chains satisfying the Doeblin condition. This condition, among other things, implies that the chain is not singular in its behavior, i.e. it is irreducible. In this article, we focus on the case when the training data set is drawn from a not necessarily irreducible Markov chain. Under the assumption that the chain is uniformly ergodic with respect to the $\mathrm{L}^1$-Wasserstein distance, and certain regularity assumptions on the hypothesis class and the state space of the chain, we first obtain a uniform convergence result for the corresponding sample error, and then we conclude learnability of the approximate sample error minimization algorithm and find its generalization bounds. At the end, a relative uniform convergence result for the sample error is also discussed.
△ Less
Submitted 20 January, 2023; v1 submitted 8 October, 2021;
originally announced October 2021.