From Data to Rewards: a Bilevel Optimization Perspective on Maximum Likelihood Estimation

Benechehab, Abdelhakim; Singer, Gabriel; Léger, Corentin; Hili, Youssef Attia El; Paolo, Giuseppe; Thomas, Albert; Filippone, Maurizio; Kégl, Balázs

Statistics > Machine Learning

arXiv:2510.07624 (stat)

[Submitted on 8 Oct 2025 (v1), last revised 13 Oct 2025 (this version, v3)]

Title:From Data to Rewards: a Bilevel Optimization Perspective on Maximum Likelihood Estimation

Authors:Abdelhakim Benechehab, Gabriel Singer, Corentin Léger, Youssef Attia El Hili, Giuseppe Paolo, Albert Thomas, Maurizio Filippone, Balázs Kégl

View PDF HTML (experimental)

Abstract:Generative models form the backbone of modern machine learning, underpinning state-of-the-art systems in text, vision, and multimodal applications. While Maximum Likelihood Estimation has traditionally served as the dominant training paradigm, recent work have highlighted its limitations, particularly in generalization and susceptibility to catastrophic forgetting compared to Reinforcement Learning techniques, such as Policy Gradient methods. However, these approaches depend on explicit reward signals, which are often unavailable in practice, leaving open the fundamental problem of how to align generative models when only high-quality datasets are accessible. In this work, we address this challenge via a Bilevel Optimization framework, where the reward function is treated as the optimization variable of an outer-level problem, while a policy gradient objective defines the inner-level. We then conduct a theoretical analysis of this optimization problem in a tractable setting and extract insights that, as we demonstrate, generalize to applications such as tabular classification and model-based reinforcement learning. We release the code at this https URL .

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2510.07624 [stat.ML]
	(or arXiv:2510.07624v3 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2510.07624

Submission history

From: Abdelhakim Benechehab [view email]
[v1] Wed, 8 Oct 2025 23:45:37 UTC (631 KB)
[v2] Fri, 10 Oct 2025 13:45:35 UTC (633 KB)
[v3] Mon, 13 Oct 2025 13:24:41 UTC (631 KB)

Statistics > Machine Learning

Title:From Data to Rewards: a Bilevel Optimization Perspective on Maximum Likelihood Estimation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:From Data to Rewards: a Bilevel Optimization Perspective on Maximum Likelihood Estimation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators