Search | arXiv e-print repository

doi 10.1007/978-3-031-61716-4_4

Learning Guided Automated Reasoning: A Brief Survey

Authors: Lasse Blaauwbroek, David Cerna, Thibault Gauthier, Jan Jakubův, Cezary Kaliszyk, Martin Suda, Josef Urban

Abstract: Automated theorem provers and formal proof assistants are general reasoning systems that are in theory capable of proving arbitrarily hard theorems, thus solving arbitrary problems reducible to mathematics and logical reasoning. In practice, such systems however face large combinatorial explosion, and therefore include many heuristics and choice points that considerably influence their performance… ▽ More Automated theorem provers and formal proof assistants are general reasoning systems that are in theory capable of proving arbitrarily hard theorems, thus solving arbitrary problems reducible to mathematics and logical reasoning. In practice, such systems however face large combinatorial explosion, and therefore include many heuristics and choice points that considerably influence their performance. This is an opportunity for trained machine learning predictors, which can guide the work of such reasoning systems. Conversely, deductive search supported by the notion of logically valid proof allows one to train machine learning systems on large reasoning corpora. Such bodies of proof are usually correct by construction and when combined with more and more precise trained guidance they can be boostrapped into very large corpora, with increasingly long reasoning chains and possibly novel proof ideas. In this paper we provide an overview of several automated reasoning and theorem proving domains and the learning and AI methods that have been so far developed for them. These include premise selection, proof guidance in several settings, AI systems and feedback loops iterating between reasoning and learning, and symbolic classification problems. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2401.02950 [pdf, other]

The Tactician's Web of Large-Scale Formal Knowledge

Authors: Lasse Blaauwbroek

Abstract: The Tactician's Web is a platform offering a large web of strongly interconnected, machine-checked, formal mathematical knowledge conveniently packaged for machine learning, analytics, and proof engineering. Built on top of the Coq proof assistant, the platform exports a dataset containing a wide variety of formal theories, presented as a web of definitions, theorems, proof terms, tactics, and pro… ▽ More The Tactician's Web is a platform offering a large web of strongly interconnected, machine-checked, formal mathematical knowledge conveniently packaged for machine learning, analytics, and proof engineering. Built on top of the Coq proof assistant, the platform exports a dataset containing a wide variety of formal theories, presented as a web of definitions, theorems, proof terms, tactics, and proof states. Theories are encoded both as a semantic graph (rendered below) and as human-readable text, each with a unique set of advantages and disadvantages. Proving agents may interact with Coq through the same rich data representation and can be automatically benchmarked on a set of theorems. Tight integration with Coq provides the unique possibility to make agents available to proof engineers as practical tools. △ Less

Submitted 9 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: 47 pages

MSC Class: 03B40 (Primary) 68T30; 03B38 (Secondary) ACM Class: F.4.1

arXiv:2401.02949 [pdf, other]

Graph2Tac: Online Representation Learning of Formal Math Concepts

Authors: Lasse Blaauwbroek, Miroslav Olšák, Jason Rute, Fidel Ivan Schaposnik Massolo, Jelle Piepenbrock, Vasily Pestun

Abstract: In proof assistants, the physical proximity between two formal mathematical concepts is a strong predictor of their mutual relevance. Furthermore, lemmas with close proximity regularly exhibit similar proof structures. We show that this locality property can be exploited through online learning techniques to obtain solving agents that far surpass offline learners when asked to prove theorems in an… ▽ More In proof assistants, the physical proximity between two formal mathematical concepts is a strong predictor of their mutual relevance. Furthermore, lemmas with close proximity regularly exhibit similar proof structures. We show that this locality property can be exploited through online learning techniques to obtain solving agents that far surpass offline learners when asked to prove theorems in an unseen mathematical setting. We extensively benchmark two such online solvers implemented in the Tactician platform for the Coq proof assistant: First, Tactician's online $k$-nearest neighbor solver, which can learn from recent proofs, shows a $1.72\times$ improvement in theorems proved over an offline equivalent. Second, we introduce a graph neural network, Graph2Tac, with a novel approach to build hierarchical representations for new definitions. Graph2Tac's online definition task realizes a $1.5\times$ improvement in theorems solved over an offline baseline. The $k$-NN and Graph2Tac solvers rely on orthogonal online data, making them highly complementary. Their combination improves $1.27\times$ over their individual performances. Both solvers outperform all other general-purpose provers for Coq, including CoqHammer, Proverbot9001, and a transformer baseline by at least $1.48\times$ and are available for practical use by end-users. △ Less

Submitted 23 June, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: 31 pages

MSC Class: 68T07 (Primary) 68V15 (Secondary) ACM Class: I.2.3; I.2.6

arXiv:2401.02948 [pdf, other]

doi 10.1145/3656459

Hashing Modulo Context-Sensitive $α$-Equivalence

Authors: Lasse Blaauwbroek, Miroslav Olšák, Herman Geuvers

Abstract: The notion of $α$-equivalence between $λ$-terms is commonly used to identify terms that are considered equal. However, due to the primitive treatment of free variables, this notion falls short when comparing subterms occurring within a larger context. Depending on the usage of the Barendregt convention (choosing different variable names for all involved binders), it will equate either too few or t… ▽ More The notion of $α$-equivalence between $λ$-terms is commonly used to identify terms that are considered equal. However, due to the primitive treatment of free variables, this notion falls short when comparing subterms occurring within a larger context. Depending on the usage of the Barendregt convention (choosing different variable names for all involved binders), it will equate either too few or too many subterms. We introduce a formal notion of context-sensitive $α$-equivalence, where two open terms can be compared within a context that resolves their free variables. We show that this equivalence coincides exactly with the notion of bisimulation equivalence. Furthermore, we present an efficient $O(n\log n)$ runtime hashing scheme that identifies $λ$-terms modulo context-sensitive $α$-equivalence, generalizing over traditional bisimulation partitioning algorithms and improving upon a previously established $O(n\log^2 n)$ bound for a hashing modulo ordinary $α$-equivalence by Maziarz et al. Hashing $λ$-terms is useful in many applications that require common subterm elimination and structure sharing. We have employed the algorithm to obtain a large-scale, densely packed, interconnected graph of mathematical knowledge from the Coq proof assistant for machine learning purposes. △ Less

Submitted 23 June, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: 33 pages

MSC Class: 68N18 (Primary) 68N20; 03B40 (Secondary) ACM Class: D.3.1

arXiv:2104.05207 [pdf, other]

Online Machine Learning Techniques for Coq: A Comparison

Authors: Liao Zhang, Lasse Blaauwbroek, Bartosz Piotrowski, Prokop Černý, Cezary Kaliszyk, Josef Urban

Abstract: We present a comparison of several online machine learning techniques for tactical learning and proving in the Coq proof assistant. This work builds on top of Tactician, a plugin for Coq that learns from proofs written by the user to synthesize new proofs. Learning happens in an online manner, meaning that Tactician's machine learning model is updated immediately every time the user performs a ste… ▽ More We present a comparison of several online machine learning techniques for tactical learning and proving in the Coq proof assistant. This work builds on top of Tactician, a plugin for Coq that learns from proofs written by the user to synthesize new proofs. Learning happens in an online manner, meaning that Tactician's machine learning model is updated immediately every time the user performs a step in an interactive proof. This has important advantages compared to the more studied offline learning systems: (1) it provides the user with a seamless, interactive experience with Tactician and, (2) it takes advantage of locality of proof similarity, which means that proofs similar to the current proof are likely to be found close by. We implement two online methods, namely approximate k-nearest neighbors based on locality sensitive hashing forests and random decision forests. Additionally, we conduct experiments with gradient boosted trees in an offline setting using XGBoost. We compare the relative performance of Tactician using these three learning methods on Coq's standard library. △ Less

Submitted 7 June, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: Intelligent Computer Mathematics 14th International Conference, CICM 2021

arXiv:2008.00120 [pdf, other]

doi 10.1007/978-3-030-53518-6_17

The Tactician (extended version): A Seamless, Interactive Tactic Learner and Prover for Coq

Authors: Lasse Blaauwbroek, Josef Urban, Herman Geuvers

Abstract: We present Tactician, a tactic learner and prover for the Coq Proof Assistant. Tactician helps users make tactical proof decisions while they retain control over the general proof strategy. To this end, Tactician learns from previously written tactic scripts and gives users either suggestions about the next tactic to be executed or altogether takes over the burden of proof synthesis. Tactician's g… ▽ More We present Tactician, a tactic learner and prover for the Coq Proof Assistant. Tactician helps users make tactical proof decisions while they retain control over the general proof strategy. To this end, Tactician learns from previously written tactic scripts and gives users either suggestions about the next tactic to be executed or altogether takes over the burden of proof synthesis. Tactician's goal is to provide users with a seamless, interactive, and intuitive experience together with robust and adaptive proof automation. In this paper, we give an overview of Tactician from the user's point of view, regarding both day-to-day usage and issues of package dependency management while learning in the large. Finally, we give a peek into Tactician's implementation as a Coq plugin and machine learning platform. △ Less

Submitted 31 July, 2020; originally announced August 2020.

Comments: 19 pages, 2 figures. This is an extended version of a paper published in CICM-2020. For the project website, see https://coq-tactician.github.io

MSC Class: 68T27 ACM Class: F.4.1

Journal ref: In CICM. volume 12236 of Lecture Notes in Computer Science, pages 271-277. Springer, 2020

arXiv:2003.09140 [pdf, other]

doi 10.29007/wg1q

Tactic Learning and Proving for the Coq Proof Assistant

Authors: Lasse Blaauwbroek, Josef Urban, Herman Geuvers

Abstract: We present a system that utilizes machine learning for tactic proof search in the Coq Proof Assistant. In a similar vein as the TacticToe project for HOL4, our system predicts appropriate tactics and finds proofs in the form of tactic scripts. To do this, it learns from previous tactic scripts and how they are applied to proof states. The performance of the system is evaluated on the Coq Standard… ▽ More We present a system that utilizes machine learning for tactic proof search in the Coq Proof Assistant. In a similar vein as the TacticToe project for HOL4, our system predicts appropriate tactics and finds proofs in the form of tactic scripts. To do this, it learns from previous tactic scripts and how they are applied to proof states. The performance of the system is evaluated on the Coq Standard Library. Currently, our predictor can identify the correct tactic to be applied to a proof state 23.4% of the time. Our proof searcher can fully automatically prove 39.3% of the lemmas. When combined with the CoqHammer system, the two systems together prove 56.7% of the library's lemmas. △ Less

Submitted 20 March, 2020; originally announced March 2020.

Comments: 12 pages, 2 figures, 1 table. For the associated artefacts, see https://doi.org/10.5281/zenodo.3693760

MSC Class: 68T15 ACM Class: F.4.1

Journal ref: In LPAR, volume 73 of EPiC Series in Computing, pages 138-150. Easychair, 2020

Showing 1–7 of 7 results for author: Blaauwbroek, L