Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

Shrivastava, Vaishnavi; Awadallah, Ahmed; Balachandran, Vidhisha; Garg, Shivam; Behl, Harkirat; Papailiopoulos, Dimitris

Computer Science > Computation and Language

arXiv:2508.09726 (cs)

[Submitted on 13 Aug 2025]

Title:Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

Authors:Vaishnavi Shrivastava, Ahmed Awadallah, Vidhisha Balachandran, Shivam Garg, Harkirat Behl, Dimitris Papailiopoulos

View PDF

Abstract:Large language models trained with reinforcement learning with verifiable rewards tend to trade accuracy for length--inflating response lengths to achieve gains in accuracy. While longer answers may be warranted for harder problems, many tokens are merely "filler": repetitive, verbose text that makes no real progress. We introduce GFPO (Group Filtered Policy Optimization), which curbs this length explosion by sampling larger groups per problem during training and filtering responses to train on based on two key metrics: (1) response length and (2) token efficiency: reward per token ratio. By sampling more at training time, we teach models to think less at inference time. On the Phi-4-reasoning model, GFPO cuts GRPO's length inflation by 46-71% across challenging STEM and coding benchmarks (AIME 24/25, GPQA, Omni-MATH, LiveCodeBench) while maintaining accuracy. Optimizing for reward per token further increases reductions in length inflation to 71-85%. We also propose Adaptive Difficulty GFPO, which dynamically allocates more training resources to harder problems based on real-time difficulty estimates, improving the balance between computational efficiency and accuracy especially on difficult questions. GFPO demonstrates that increased training-time compute directly translates to reduced test-time compute--a simple yet effective trade-off for efficient reasoning.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2508.09726 [cs.CL]
	(or arXiv:2508.09726v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2508.09726

Submission history

From: Vaishnavi Shrivastava [view email]
[v1] Wed, 13 Aug 2025 11:43:49 UTC (1,573 KB)

Computer Science > Computation and Language

Title:Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators