-
Using Reasoning Models to Generate Search Heuristics that Solve Open Instances of Combinatorial Design Problems
Authors:
Christopher D. Rosin
Abstract:
Large Language Models (LLMs) with reasoning are trained to iteratively generate and refine their answers before finalizing them, which can help with applications to mathematics and code generation. We apply code generation with reasoning LLMs to a specific task in the mathematical field of combinatorial design. This field studies diverse types of combinatorial designs, many of which have lists of…
▽ More
Large Language Models (LLMs) with reasoning are trained to iteratively generate and refine their answers before finalizing them, which can help with applications to mathematics and code generation. We apply code generation with reasoning LLMs to a specific task in the mathematical field of combinatorial design. This field studies diverse types of combinatorial designs, many of which have lists of open instances for which existence has not yet been determined. The Constructive Protocol CPro1 uses LLMs to generate search heuristics that have the potential to construct solutions to small open instances. Starting with a textual definition and a validity verifier for a particular type of design, CPro1 guides LLMs to select and implement strategies, while providing automated hyperparameter tuning and execution feedback. CPro1 with reasoning LLMs successfully solves long-standing open instances for 7 of 16 combinatorial design problems selected from the 2006 Handbook of Combinatorial Designs, including new solved instances for 3 of these (Bhaskar Rao Designs, Symmetric Weighing Matrices, Balanced Ternary Designs) that were unsolved by CPro1 with non-reasoning LLMs. It also solves open instances for several problems from recent (2025) literature, generating new Covering Sequences, Johnson Clique Covers, Deletion Codes, and a Uniform Nested Steiner Quadruple System.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
EvidenceBench: A Benchmark for Extracting Evidence from Biomedical Papers
Authors:
Jianyou Wang,
Weili Cao,
Kaicheng Wang,
Xiaoyue Wang,
Ashish Dalvi,
Gino Prasad,
Qishan Liang,
Hsuan-lin Her,
Ming Wang,
Qin Yang,
Gene W. Yeo,
David E. Neal,
Maxim Khan,
Christopher D. Rosin,
Ramamohan Paturi,
Leon Bergen
Abstract:
We study the task of automatically finding evidence relevant to hypotheses in biomedical papers. Finding relevant evidence is an important step when researchers investigate scientific hypotheses. We introduce EvidenceBench to measure models performance on this task, which is created by a novel pipeline that consists of hypothesis generation and sentence-by-sentence annotation of biomedical papers…
▽ More
We study the task of automatically finding evidence relevant to hypotheses in biomedical papers. Finding relevant evidence is an important step when researchers investigate scientific hypotheses. We introduce EvidenceBench to measure models performance on this task, which is created by a novel pipeline that consists of hypothesis generation and sentence-by-sentence annotation of biomedical papers for relevant evidence, completely guided by and faithfully following existing human experts judgment. We demonstrate the pipeline's validity and accuracy with multiple sets of human-expert annotations. We evaluated a diverse set of language models and retrieval systems on the benchmark and found that model performances still fall significantly short of the expert level on this task. To show the scalability of our proposed pipeline, we create a larger EvidenceBench-100k with 107,461 fully annotated papers with hypotheses to facilitate model training and development. Both datasets are available at https://github.com/EvidenceBench/EvidenceBench
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Using Code Generation to Solve Open Instances of Combinatorial Design Problems
Authors:
Christopher D. Rosin
Abstract:
The Handbook of Combinatorial Designs catalogs many types of combinatorial designs, together with lists of open instances for which existence has not yet been determined. We develop a constructive protocol CPro1, which uses Large Language Models (LLMs) to generate code that constructs combinatorial designs and resolves some of these open instances. The protocol starts from a definition of a partic…
▽ More
The Handbook of Combinatorial Designs catalogs many types of combinatorial designs, together with lists of open instances for which existence has not yet been determined. We develop a constructive protocol CPro1, which uses Large Language Models (LLMs) to generate code that constructs combinatorial designs and resolves some of these open instances. The protocol starts from a definition of a particular type of design, and a verifier that reliably confirms whether a proposed design is valid. The LLM selects strategies and implements them in code, and scaffolding provides automated hyperparameter tuning and execution feedback using the verifier. Most generated code fails, but by generating many candidates, the protocol automates exploration of a variety of standard methods (e.g. simulated annealing, genetic algorithms) and experimentation with variations (e.g. cost functions) to find successful approaches. Testing on 16 different types of designs, CPro1 constructs solutions to open instances for 6 of them: Symmetric and Skew Weighing Matrices, Equidistant Permutation Arrays, Packing Arrays, Balanced Ternary Designs, and Florentine Rectangles.
△ Less
Submitted 29 January, 2025;
originally announced January 2025.
-
Stepping Stones to Inductive Synthesis of Low-Level Looping Programs
Authors:
Christopher D. Rosin
Abstract:
Inductive program synthesis, from input/output examples, can provide an opportunity to automatically create programs from scratch without presupposing the algorithmic form of the solution. For induction of general programs with loops (as opposed to loop-free programs, or synthesis for domain-specific languages), the state of the art is at the level of introductory programming assignments. Most pro…
▽ More
Inductive program synthesis, from input/output examples, can provide an opportunity to automatically create programs from scratch without presupposing the algorithmic form of the solution. For induction of general programs with loops (as opposed to loop-free programs, or synthesis for domain-specific languages), the state of the art is at the level of introductory programming assignments. Most problems that require algorithmic subtlety, such as fast sorting, have remained out of reach without the benefit of significant problem-specific background knowledge. A key challenge is to identify cues that are available to guide search towards correct looping programs. We present MAKESPEARE, a simple delayed-acceptance hillclimbing method that synthesizes low-level looping programs from input/output examples. During search, delayed acceptance bypasses small gains to identify significantly-improved stepping stone programs that tend to generalize and enable further progress. The method performs well on a set of established benchmarks, and succeeds on the previously unsolved "Collatz Numbers" program synthesis problem. Additional benchmarks include the problem of rapidly sorting integer arrays, in which we observe the emergence of comb sort (a Shell sort variant that is empirically fast). MAKESPEARE has also synthesized a record-setting program on one of the puzzles from the TIS-100 assembly language programming game.
△ Less
Submitted 26 November, 2018;
originally announced November 2018.
-
Unweighted Stochastic Local Search can be Effective for Random CSP Benchmarks
Authors:
Christopher D. Rosin
Abstract:
We present ULSA, a novel stochastic local search algorithm for random binary constraint satisfaction problems (CSP). ULSA is many times faster than the prior state of the art on a widely-studied suite of random CSP benchmarks. Unlike the best previous methods for these benchmarks, ULSA is a simple unweighted method that does not require dynamic adaptation of weights or penalties. ULSA obtains new…
▽ More
We present ULSA, a novel stochastic local search algorithm for random binary constraint satisfaction problems (CSP). ULSA is many times faster than the prior state of the art on a widely-studied suite of random CSP benchmarks. Unlike the best previous methods for these benchmarks, ULSA is a simple unweighted method that does not require dynamic adaptation of weights or penalties. ULSA obtains new record best solutions satisfying 99 of 100 variables in the challenging frb100-40 benchmark instance.
△ Less
Submitted 27 November, 2014;
originally announced November 2014.