-
Operator Learning for Schrödinger Equation: Unitarity, Error Bounds, and Time Generalization
Authors:
Yash Patel,
Unique Subedi,
Ambuj Tewari
Abstract:
We consider the problem of learning the evolution operator for the time-dependent Schrödinger equation, where the Hamiltonian may vary with time. Existing neural network-based surrogates often ignore fundamental properties of the Schrödinger equation, such as linearity and unitarity, and lack theoretical guarantees on prediction error or time generalization. To address this, we introduce a linear…
▽ More
We consider the problem of learning the evolution operator for the time-dependent Schrödinger equation, where the Hamiltonian may vary with time. Existing neural network-based surrogates often ignore fundamental properties of the Schrödinger equation, such as linearity and unitarity, and lack theoretical guarantees on prediction error or time generalization. To address this, we introduce a linear estimator for the evolution operator that preserves a weak form of unitarity. We establish both upper and lower bounds on the prediction error that hold uniformly over all sufficiently smooth initial wave functions. Additionally, we derive time generalization bounds that quantify how the estimator extrapolates beyond the time points seen during training. Experiments across real-world Hamiltonians -- including hydrogen atoms, ion traps for qubit design, and optical lattices -- show that our estimator achieves relative errors $10^{-2}$ to $10^{-3}$ times smaller than state-of-the-art methods such as the Fourier Neural Operator and DeepONet.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation
Authors:
Seamus Somerstep,
Vinod Raman,
Unique Subedi,
Yuekai Sun
Abstract:
Using the bit string generation problem as a case study, we theoretically compare two standard methods for adapting large language models to new tasks. The first, referred to as supervised fine-tuning, involves training a new next token predictor on good generations. The second method, Best-of-N, trains a reward model to select good responses from a collection generated by an unaltered base model.…
▽ More
Using the bit string generation problem as a case study, we theoretically compare two standard methods for adapting large language models to new tasks. The first, referred to as supervised fine-tuning, involves training a new next token predictor on good generations. The second method, Best-of-N, trains a reward model to select good responses from a collection generated by an unaltered base model. If the learning setting is realizable, we find that supervised fine-tuning outperforms BoN through a better dependence on the response length in its rate of convergence. If realizability fails, then depending on the failure mode, BoN can enjoy a better rate of convergence in either n or a rate of convergence with better dependence on the response length.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Operator Learning: A Statistical Perspective
Authors:
Unique Subedi,
Ambuj Tewari
Abstract:
Operator learning has emerged as a powerful tool in scientific computing for approximating mappings between infinite-dimensional function spaces. A primary application of operator learning is the development of surrogate models for the solution operators of partial differential equations (PDEs). These methods can also be used to develop black-box simulators to model system behavior from experiment…
▽ More
Operator learning has emerged as a powerful tool in scientific computing for approximating mappings between infinite-dimensional function spaces. A primary application of operator learning is the development of surrogate models for the solution operators of partial differential equations (PDEs). These methods can also be used to develop black-box simulators to model system behavior from experimental data, even without a known mathematical model. In this article, we begin by formalizing operator learning as a function-to-function regression problem and review some recent developments in the field. We also discuss PDE-specific operator learning, outlining strategies for incorporating physical and mathematical constraints into architecture design and training processes. Finally, we end by highlighting key future directions such as active data collection and the development of rigorous uncertainty quantification frameworks.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
Multiclass Transductive Online Learning
Authors:
Steve Hanneke,
Vinod Raman,
Amirreza Shaeiri,
Unique Subedi
Abstract:
We consider the problem of multiclass transductive online learning when the number of labels can be unbounded. Previous works by Ben-David et al. [1997] and Hanneke et al. [2023b] only consider the case of binary and finite label spaces, respectively. The latter work determined that their techniques fail to extend to the case of unbounded label spaces, and they pose the question of characterizing…
▽ More
We consider the problem of multiclass transductive online learning when the number of labels can be unbounded. Previous works by Ben-David et al. [1997] and Hanneke et al. [2023b] only consider the case of binary and finite label spaces, respectively. The latter work determined that their techniques fail to extend to the case of unbounded label spaces, and they pose the question of characterizing the optimal mistake bound for unbounded label spaces. We answer this question by showing that a new dimension, termed the Level-constrained Littlestone dimension, characterizes online learnability in this setting. Along the way, we show that the trichotomy of possible minimax rates of the expected number of mistakes established by Hanneke et al. [2023b] for finite label spaces in the realizable setting continues to hold even when the label space is unbounded. In particular, if the learner plays for $T \in \mathbb{N}$ rounds, its minimax expected number of mistakes can only grow like $Θ(T)$, $Θ(\log T)$, or $Θ(1)$. To prove this result, we give another combinatorial dimension, termed the Level-constrained Branching dimension, and show that its finiteness characterizes constant minimax expected mistake-bounds. The trichotomy is then determined by a combination of the Level-constrained Littlestone and Branching dimensions. Quantitatively, our upper bounds improve upon existing multiclass upper bounds in Hanneke et al. [2023b] by removing the dependence on the label set size. In doing so, we explicitly construct learning algorithms that can handle extremely large or unbounded label spaces. A key and novel component of our algorithm is a new notion of shattering that exploits the sequential nature of transductive online learning. Finally, we complete our results by proving expected regret bounds in the agnostic setting, extending the result of Hanneke et al. [2023b].
△ Less
Submitted 3 November, 2024;
originally announced November 2024.
-
On the Benefits of Active Data Collection in Operator Learning
Authors:
Unique Subedi,
Ambuj Tewari
Abstract:
We study active data collection strategies for operator learning when the target operator is linear and the input functions are drawn from a mean-zero stochastic process with continuous covariance kernels. With an active data collection strategy, we establish an error convergence rate in terms of the decay rate of the eigenvalues of the covariance kernel. We can achieve arbitrarily fast error conv…
▽ More
We study active data collection strategies for operator learning when the target operator is linear and the input functions are drawn from a mean-zero stochastic process with continuous covariance kernels. With an active data collection strategy, we establish an error convergence rate in terms of the decay rate of the eigenvalues of the covariance kernel. We can achieve arbitrarily fast error convergence rates with sufficiently rapid eigenvalue decay of the covariance kernels. This contrasts with the passive (i.i.d.) data collection strategies, where the convergence rate is never faster than linear decay ($\sim n^{-1}$). In fact, for our setting, we show a \emph{non-vanishing} lower bound for any passive data collection strategy, regardless of the eigenvalues decay rate of the covariance kernel. Overall, our results show the benefit of active data collection strategies in operator learning over their passive counterparts.
△ Less
Submitted 6 February, 2025; v1 submitted 25 October, 2024;
originally announced October 2024.
-
Controlling Statistical, Discretization, and Truncation Errors in Learning Fourier Linear Operators
Authors:
Unique Subedi,
Ambuj Tewari
Abstract:
We study learning-theoretic foundations of operator learning, using the linear layer of the Fourier Neural Operator architecture as a model problem. First, we identify three main errors that occur during the learning process: statistical error due to finite sample size, truncation error from finite rank approximation of the operator, and discretization error from handling functional data on a fini…
▽ More
We study learning-theoretic foundations of operator learning, using the linear layer of the Fourier Neural Operator architecture as a model problem. First, we identify three main errors that occur during the learning process: statistical error due to finite sample size, truncation error from finite rank approximation of the operator, and discretization error from handling functional data on a finite grid of domain points. Finally, we analyze a Discrete Fourier Transform (DFT) based least squares estimator, establishing both upper and lower bounds on the aforementioned errors.
△ Less
Submitted 6 February, 2025; v1 submitted 16 August, 2024;
originally announced August 2024.
-
Smoothed Online Classification can be Harder than Batch Classification
Authors:
Vinod Raman,
Unique Subedi,
Ambuj Tewari
Abstract:
We study online classification under smoothed adversaries. In this setting, at each time point, the adversary draws an example from a distribution that has a bounded density with respect to a fixed base measure, which is known apriori to the learner. For binary classification and scalar-valued regression, previous works \citep{haghtalab2020smoothed, block2022smoothed} have shown that smoothed onli…
▽ More
We study online classification under smoothed adversaries. In this setting, at each time point, the adversary draws an example from a distribution that has a bounded density with respect to a fixed base measure, which is known apriori to the learner. For binary classification and scalar-valued regression, previous works \citep{haghtalab2020smoothed, block2022smoothed} have shown that smoothed online learning is as easy as learning in the iid batch setting under PAC model. However, we show that smoothed online classification can be harder than the iid batch classification when the label space is unbounded. In particular, we construct a hypothesis class that is learnable in the iid batch setting under the PAC model but is not learnable under the smoothed online model. Finally, we identify a condition that ensures that the PAC learnability of a hypothesis class is sufficient for its smoothed online learnability.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
The Complexity of Sequential Prediction in Dynamical Systems
Authors:
Vinod Raman,
Unique Subedi,
Ambuj Tewari
Abstract:
We study the problem of learning to predict the next state of a dynamical system when the underlying evolution function is unknown. Unlike previous work, we place no parametric assumptions on the dynamical system, and study the problem from a learning theory perspective. We define new combinatorial measures and dimensions and show that they quantify the optimal mistake and regret bounds in the rea…
▽ More
We study the problem of learning to predict the next state of a dynamical system when the underlying evolution function is unknown. Unlike previous work, we place no parametric assumptions on the dynamical system, and study the problem from a learning theory perspective. We define new combinatorial measures and dimensions and show that they quantify the optimal mistake and regret bounds in the realizable and agnostic settings respectively. By doing so, we find that in the realizable setting, the total number of mistakes can grow according to \emph{any} increasing function of the time horizon $T$. In contrast, we show that in the agnostic setting under the commonly studied notion of Markovian regret, the only possible rates are $Θ(T)$ and $\tildeΘ(\sqrt{T})$.
△ Less
Submitted 2 June, 2025; v1 submitted 9 February, 2024;
originally announced February 2024.
-
Apple Tasting: Combinatorial Dimensions and Minimax Rates
Authors:
Vinod Raman,
Unique Subedi,
Ananth Raman,
Ambuj Tewari
Abstract:
In online binary classification under \emph{apple tasting} feedback, the learner only observes the true label if it predicts ``1". First studied by \cite{helmbold2000apple}, we revisit this classical partial-feedback setting and study online learnability from a combinatorial perspective. We show that the Littlestone dimension continues to provide a tight quantitative characterization of apple tast…
▽ More
In online binary classification under \emph{apple tasting} feedback, the learner only observes the true label if it predicts ``1". First studied by \cite{helmbold2000apple}, we revisit this classical partial-feedback setting and study online learnability from a combinatorial perspective. We show that the Littlestone dimension continues to provide a tight quantitative characterization of apple tasting in the agnostic setting, closing an open question posed by \cite{helmbold2000apple}. In addition, we give a new combinatorial parameter, called the Effective width, that tightly quantifies the minimax expected mistakes in the realizable setting. As a corollary, we use the Effective width to establish a \emph{trichotomy} of the minimax expected number of mistakes in the realizable setting. In particular, we show that in the realizable setting, the expected number of mistakes of any learner, under apple tasting feedback, can be $Θ(1), Θ(\sqrt{T})$, or $Θ(T)$. This is in contrast to the full-information realizable setting where only $Θ(1)$ and $Θ(T)$ are possible.
△ Less
Submitted 18 June, 2024; v1 submitted 29 October, 2023;
originally announced October 2023.
-
Online Infinite-Dimensional Regression: Learning Linear Operators
Authors:
Vinod Raman,
Unique Subedi,
Ambuj Tewari
Abstract:
We consider the problem of learning linear operators under squared loss between two infinite-dimensional Hilbert spaces in the online setting. We show that the class of linear operators with uniformly bounded $p$-Schatten norm is online learnable for any $p \in [1, \infty)$. On the other hand, we prove an impossibility result by showing that the class of uniformly bounded linear operators with res…
▽ More
We consider the problem of learning linear operators under squared loss between two infinite-dimensional Hilbert spaces in the online setting. We show that the class of linear operators with uniformly bounded $p$-Schatten norm is online learnable for any $p \in [1, \infty)$. On the other hand, we prove an impossibility result by showing that the class of uniformly bounded linear operators with respect to the operator norm is \textit{not} online learnable. Moreover, we show a separation between sequential uniform convergence and online learnability by identifying a class of bounded linear operators that is online learnable but uniform convergence does not hold. Finally, we prove that the impossibility result and the separation between uniform convergence and learnability also hold in the batch setting.
△ Less
Submitted 24 January, 2024; v1 submitted 8 September, 2023;
originally announced September 2023.
-
Multiclass Online Learnability under Bandit Feedback
Authors:
Ananth Raman,
Vinod Raman,
Unique Subedi,
Idan Mehalel,
Ambuj Tewari
Abstract:
We study online multiclass classification under bandit feedback. We extend the results of Daniely and Helbertal [2013] by showing that the finiteness of the Bandit Littlestone dimension is necessary and sufficient for bandit online learnability even when the label space is unbounded. Moreover, we show that, unlike the full-information setting, sequential uniform convergence is necessary but not su…
▽ More
We study online multiclass classification under bandit feedback. We extend the results of Daniely and Helbertal [2013] by showing that the finiteness of the Bandit Littlestone dimension is necessary and sufficient for bandit online learnability even when the label space is unbounded. Moreover, we show that, unlike the full-information setting, sequential uniform convergence is necessary but not sufficient for bandit online learnability. Our result complements the recent work by Hanneke, Moran, Raman, Subedi, and Tewari [2023] who show that the Littlestone dimension characterizes online multiclass learnability in the full-information setting even when the label space is unbounded.
△ Less
Submitted 20 January, 2024; v1 submitted 8 August, 2023;
originally announced August 2023.
-
A Combinatorial Characterization of Supervised Online Learnability
Authors:
Vinod Raman,
Unique Subedi,
Ambuj Tewari
Abstract:
We study the online learnability of hypothesis classes with respect to arbitrary, but bounded loss functions. No characterization of online learnability is known at this level of generality. We give a new scale-sensitive combinatorial dimension, named the sequential minimax dimension, and show that it gives a tight quantitative characterization of online learnability. In addition, we show that the…
▽ More
We study the online learnability of hypothesis classes with respect to arbitrary, but bounded loss functions. No characterization of online learnability is known at this level of generality. We give a new scale-sensitive combinatorial dimension, named the sequential minimax dimension, and show that it gives a tight quantitative characterization of online learnability. In addition, we show that the sequential minimax dimension subsumes most existing combinatorial dimensions in online learning theory.
△ Less
Submitted 9 February, 2024; v1 submitted 7 July, 2023;
originally announced July 2023.
-
Online Learning with Set-Valued Feedback
Authors:
Vinod Raman,
Unique Subedi,
Ambuj Tewari
Abstract:
We study a variant of online multiclass classification where the learner predicts a single label but receives a \textit{set of labels} as feedback. In this model, the learner is penalized for not outputting a label contained in the revealed set. We show that unlike online multiclass learning with single-label feedback, deterministic and randomized online learnability are \textit{not equivalent} ev…
▽ More
We study a variant of online multiclass classification where the learner predicts a single label but receives a \textit{set of labels} as feedback. In this model, the learner is penalized for not outputting a label contained in the revealed set. We show that unlike online multiclass learning with single-label feedback, deterministic and randomized online learnability are \textit{not equivalent} even in the realizable setting with set-valued feedback. Accordingly, we give two new combinatorial dimensions, named the Set Littlestone and Measure Shattering dimension, that tightly characterize deterministic and randomized online learnability respectively in the realizable setting. In addition, we show that the Measure Shattering dimension characterizes online learnability in the agnostic setting and tightly quantifies the minimax regret. Finally, we use our results to establish bounds on the minimax regret for three practical learning settings: online multilabel ranking, online multilabel classification, and real-valued prediction with interval-valued response.
△ Less
Submitted 18 June, 2024; v1 submitted 9 June, 2023;
originally announced June 2023.
-
On the Learnability of Multilabel Ranking
Authors:
Vinod Raman,
Unique Subedi,
Ambuj Tewari
Abstract:
Multilabel ranking is a central task in machine learning. However, the most fundamental question of learnability in a multilabel ranking setting with relevance-score feedback remains unanswered. In this work, we characterize the learnability of multilabel ranking problems in both batch and online settings for a large family of ranking losses. Along the way, we give two equivalence classes of ranki…
▽ More
Multilabel ranking is a central task in machine learning. However, the most fundamental question of learnability in a multilabel ranking setting with relevance-score feedback remains unanswered. In this work, we characterize the learnability of multilabel ranking problems in both batch and online settings for a large family of ranking losses. Along the way, we give two equivalence classes of ranking losses based on learnability that capture most, if not all, losses used in practice.
△ Less
Submitted 25 May, 2023; v1 submitted 6 April, 2023;
originally announced April 2023.
-
Multiclass Online Learning and Uniform Convergence
Authors:
Steve Hanneke,
Shay Moran,
Vinod Raman,
Unique Subedi,
Ambuj Tewari
Abstract:
We study multiclass classification in the agnostic adversarial online learning setting. As our main result, we prove that any multiclass concept class is agnostically learnable if and only if its Littlestone dimension is finite. This solves an open problem studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011,2015) who handled the case when the number of classes (or labels) is bounded. W…
▽ More
We study multiclass classification in the agnostic adversarial online learning setting. As our main result, we prove that any multiclass concept class is agnostically learnable if and only if its Littlestone dimension is finite. This solves an open problem studied by Daniely, Sabato, Ben-David, and Shalev-Shwartz (2011,2015) who handled the case when the number of classes (or labels) is bounded. We also prove a separation between online learnability and online uniform convergence by exhibiting an easy-to-learn class whose sequential Rademacher complexity is unbounded.
Our learning algorithm uses the multiplicative weights algorithm, with a set of experts defined by executions of the Standard Optimal Algorithm on subsequences of size Littlestone dimension. We argue that the best expert has regret at most Littlestone dimension relative to the best concept in the class. This differs from the well-known covering technique of Ben-David, Pál, and Shalev-Shwartz (2009) for binary classification, where the best expert has regret zero.
△ Less
Submitted 7 July, 2023; v1 submitted 30 March, 2023;
originally announced March 2023.
-
A Characterization of Multioutput Learnability
Authors:
Vinod Raman,
Unique Subedi,
Ambuj Tewari
Abstract:
We consider the problem of learning multioutput function classes in the batch and online settings. In both settings, we show that a multioutput function class is learnable if and only if each single-output restriction of the function class is learnable. This provides a complete characterization of the learnability of multilabel classification and multioutput regression in both batch and online set…
▽ More
We consider the problem of learning multioutput function classes in the batch and online settings. In both settings, we show that a multioutput function class is learnable if and only if each single-output restriction of the function class is learnable. This provides a complete characterization of the learnability of multilabel classification and multioutput regression in both batch and online settings. As an extension, we also consider multilabel learnability in the bandit feedback setting and show a similar characterization as in the full-feedback setting.
△ Less
Submitted 24 November, 2024; v1 submitted 6 January, 2023;
originally announced January 2023.
-
On Proper Learnability between Average- and Worst-case Robustness
Authors:
Vinod Raman,
Unique Subedi,
Ambuj Tewari
Abstract:
Recently, Montasser et al. [2019] showed that finite VC dimension is not sufficient for proper adversarially robust PAC learning. In light of this hardness, there is a growing effort to study what type of relaxations to the adversarially robust PAC learning setup can enable proper learnability. In this work, we initiate the study of proper learning under relaxations of the worst-case robust loss.…
▽ More
Recently, Montasser et al. [2019] showed that finite VC dimension is not sufficient for proper adversarially robust PAC learning. In light of this hardness, there is a growing effort to study what type of relaxations to the adversarially robust PAC learning setup can enable proper learnability. In this work, we initiate the study of proper learning under relaxations of the worst-case robust loss. We give a family of robust loss relaxations under which VC classes are properly PAC learnable with sample complexity close to what one would require in the standard PAC learning setup. On the other hand, we show that for an existing and natural relaxation of the worst-case robust loss, finite VC dimension is not sufficient for proper learning. Lastly, we give new generalization guarantees for the adversarially robust empirical risk minimizer.
△ Less
Submitted 25 May, 2023; v1 submitted 10 November, 2022;
originally announced November 2022.
-
A conjectural asymptotic formula for multiplicative chaos in number theory
Authors:
Daksh Aggarwal,
Unique Subedi,
William Verreault,
Asif Zaman,
Chenghui Zheng
Abstract:
We investigate a special sequence of random variables $A(N)$ defined by an exponential power series with independent standard complex Gaussians $(X(k))_{k \geq 1}$. Introduced by Hughes, Keating, and O'Connell in the study of random matrix theory, this sequence relates to Gaussian multiplicative chaos (in particular "holomorphic multiplicative chaos'' per Najnudel, Paquette, and Simm) and random m…
▽ More
We investigate a special sequence of random variables $A(N)$ defined by an exponential power series with independent standard complex Gaussians $(X(k))_{k \geq 1}$. Introduced by Hughes, Keating, and O'Connell in the study of random matrix theory, this sequence relates to Gaussian multiplicative chaos (in particular "holomorphic multiplicative chaos'' per Najnudel, Paquette, and Simm) and random multiplicative functions. Soundararajan and Zaman recently determined the order of $\mathbb{E}[|A(N)|]$. By constructing an algorithm to calculate $A(N)$ in $O(N^2 \log N)$ steps, we produce computational evidence that their result can likely be strengthened to an asymptotic result with a numerical estimate for the asymptotic constant. We also obtain similar conclusions when $A(N)$ is defined using standard real Gaussians or uniform $\pm 1$ random variables. However, our evidence suggests that the asymptotic constants do not possess a natural product structure.
△ Less
Submitted 25 August, 2021;
originally announced August 2021.
-
Sums of random multiplicative functions over function fields with few irreducible factors
Authors:
Daksh Aggarwal,
Unique Subedi,
William Verreault,
Asif Zaman,
Chenghui Zheng
Abstract:
We establish a normal approximation for the limiting distribution of partial sums of random Rademacher multiplicative functions over function fields, provided the number of irreducible factors of the polynomials is small enough. This parallels work of Harper for random Rademacher multiplicative functions over the integers.
We establish a normal approximation for the limiting distribution of partial sums of random Rademacher multiplicative functions over function fields, provided the number of irreducible factors of the polynomials is small enough. This parallels work of Harper for random Rademacher multiplicative functions over the integers.
△ Less
Submitted 28 January, 2022; v1 submitted 18 August, 2021;
originally announced August 2021.