Sem-DPO: Mitigating Semantic Inconsistency in Preference Optimization for Prompt Engineering

Mohamed, Anas; Khan, Azal Ahmad; Wang, Xinran; Khan, Ahmad Faraz; Ge, Shuwen; Khan, Saman Bahzad; Ahmad, Ayaan; Anwar, Ali

Computer Science > Computation and Language

arXiv:2507.20133 (cs)

[Submitted on 27 Jul 2025 (v1), last revised 29 Jul 2025 (this version, v2)]

Title:Sem-DPO: Mitigating Semantic Inconsistency in Preference Optimization for Prompt Engineering

Authors:Anas Mohamed, Azal Ahmad Khan, Xinran Wang, Ahmad Faraz Khan, Shuwen Ge, Saman Bahzad Khan, Ayaan Ahmad, Ali Anwar

View PDF HTML (experimental)

Abstract:Generative AI can now synthesize strikingly realistic images from text, yet output quality remains highly sensitive to how prompts are phrased. Direct Preference Optimization (DPO) offers a lightweight, off-policy alternative to RL for automatic prompt engineering, but its token-level regularization leaves semantic inconsistency unchecked as prompts that win higher preference scores can still drift away from the user's intended meaning.
We introduce Sem-DPO, a variant of DPO that preserves semantic consistency yet retains its simplicity and efficiency. Sem-DPO adjusts the DPO loss using a weight based on how different the winning prompt is from the original, reducing the impact of training examples that are semantically misaligned. We provide the first analytical bound on semantic drift for preference-tuned prompt generators, showing that Sem-DPO keeps learned prompts within a provably bounded neighborhood of the original text. On three standard text-to-image prompt-optimization benchmarks and two language models, Sem-DPO achieves 8-12% higher CLIP similarity and 5-9% higher human-preference scores (HPSv2.1, PickScore) than DPO, while also outperforming state-of-the-art baselines. These findings suggest that strong flat baselines augmented with semantic weighting should become the new standard for prompt-optimization studies and lay the groundwork for broader, semantics-aware preference optimization in language models.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2507.20133 [cs.CL]
	(or arXiv:2507.20133v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2507.20133

Submission history

From: Azal Ahmad Khan [view email]
[v1] Sun, 27 Jul 2025 05:20:13 UTC (10,692 KB)
[v2] Tue, 29 Jul 2025 04:18:09 UTC (10,693 KB)

Computer Science > Computation and Language

Title:Sem-DPO: Mitigating Semantic Inconsistency in Preference Optimization for Prompt Engineering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Sem-DPO: Mitigating Semantic Inconsistency in Preference Optimization for Prompt Engineering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators