Murphys Laws of AI Alignment: Why the Gap Always Wins

Gaikwad, Madhava

Computer Science > Artificial Intelligence

arXiv:2509.05381 (cs)

[Submitted on 4 Sep 2025 (v1), last revised 15 Sep 2025 (this version, v3)]

Title:Murphys Laws of AI Alignment: Why the Gap Always Wins

Authors:Madhava Gaikwad

View PDF HTML (experimental)

Abstract:We study reinforcement learning from human feedback under misspecification. Sometimes human feedback is systematically wrong on certain types of inputs, like a broken compass that points the wrong way in specific regions. We prove that when feedback is biased on a fraction alpha of contexts with bias strength epsilon, any learning algorithm needs exponentially many samples exp(n*alpha*epsilon^2) to distinguish between two possible "true" reward functions that differ only on these problematic contexts. However, if you can identify where feedback is unreliable (a "calibration oracle"), you can focus your limited questions there and overcome the exponential barrier with just O(1/(alpha*epsilon^2)) queries. This quantifies why alignment is hard: rare edge cases with subtly biased feedback create an exponentially hard learning problem unless you know where to look.
The gap between what we optimize (proxy from human feedback) and what we want (true objective) is fundamentally limited by how common the problematic contexts are (alpha), how wrong the feedback is there (epsilon), and how much the true objectives disagree there (gamma). Murphy's Law for AI alignment: the gap always wins unless you actively route around misspecification.

Comments:	Provides a formal impossibility theorem (Murphys Gap) and welcomes collaboration on large-scale experiments and benchmark design
Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
MSC classes:	68T01, 68T20, 68Q87
ACM classes:	F.2.2; I.2.6; I.2.7; I.2.8
Cite as:	arXiv:2509.05381 [cs.AI]
	(or arXiv:2509.05381v3 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2509.05381

Submission history

From: Madhava Gaikwad [view email]
[v1] Thu, 4 Sep 2025 23:03:25 UTC (375 KB)
[v2] Wed, 10 Sep 2025 14:35:38 UTC (370 KB)
[v3] Mon, 15 Sep 2025 06:39:32 UTC (21 KB)

Computer Science > Artificial Intelligence

Title:Murphys Laws of AI Alignment: Why the Gap Always Wins

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Murphys Laws of AI Alignment: Why the Gap Always Wins

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators