-
Proof-Producing Translation of Functional Programs into a Time \& Space Reasonable Model
Authors:
Kevin Kappelmann,
Fabian Huch,
Lukas Stevens,
Mohammad Abdulaziz
Abstract:
We present a semi-automated framework to construct and reason about programs in a deeply-embedded while-language. The while-language we consider is a simple computation model that can simulate (and be simulated by) Turing Machines with a quadratic time and constant space blow-up. Our framework derives while-programs from functional programs written in a subset of Isabelle/HOL, namely tail-recursiv…
▽ More
We present a semi-automated framework to construct and reason about programs in a deeply-embedded while-language. The while-language we consider is a simple computation model that can simulate (and be simulated by) Turing Machines with a quadratic time and constant space blow-up. Our framework derives while-programs from functional programs written in a subset of Isabelle/HOL, namely tail-recursive functions with first-order arguments and algebraic datatypes. As far as we are aware, it is the first framework targeting a computation model that is reasonable in time and space from a complexity-theoretic perspective.
△ Less
Submitted 21 April, 2025; v1 submitted 4 March, 2025;
originally announced March 2025.
-
Isabelle as Systems Platform: Managing Automated and Quasi-interactive Builds
Authors:
Fabian Huch
Abstract:
Interactive theorem provers are complex systems that require sophisticated platform efforts - and hence systems programming environments - to manage effectively. The Isabelle platform exemplifies this with its Isabelle/Scala systems programming environment, which has proven to be very successful. In contrast, much of the project infrastructure has relied on external tooling in the past, despite sh…
▽ More
Interactive theorem provers are complex systems that require sophisticated platform efforts - and hence systems programming environments - to manage effectively. The Isabelle platform exemplifies this with its Isabelle/Scala systems programming environment, which has proven to be very successful. In contrast, much of the project infrastructure has relied on external tooling in the past, despite shortcomings. For continuous integration, the previous system employed a Jenkins server, which did not adequately support user-submitted Isabelle builds and faced issues with reliability and performance. In this work, we present our design and implementation of a new Isabelle build manager that replaces the old continuous integration system, fully implemented within Isabelle/Scala. We illustrate how our implementation utilizes different modules of the environment, which supported all aspects of the build manager well.
△ Less
Submitted 17 December, 2024;
originally announced December 2024.
-
The Isabelle Community Benchmark
Authors:
Fabian Huch,
Vincent Bode
Abstract:
Choosing hardware for theorem proving is no simple task: automated provers are highly complex and optimized programs, often utilizing a parallel computation model, and there is little prior research on the hardware impact on prover performance. To alleviate the problem for Isabelle, we initiated a community benchmark where the build time of HOL-Analysis is measured. On $54$ distinct CPUs, a total…
▽ More
Choosing hardware for theorem proving is no simple task: automated provers are highly complex and optimized programs, often utilizing a parallel computation model, and there is little prior research on the hardware impact on prover performance. To alleviate the problem for Isabelle, we initiated a community benchmark where the build time of HOL-Analysis is measured. On $54$ distinct CPUs, a total of $669$ runs with different Isabelle configurations were reported by Isabelle users. Results range from $107$s to over $11$h. We found that current consumer CPUs performed best, with an optimal number of $8$ to $16$ threads, largely independent of heap memory. As for hardware parameters, CPU base clock affected multi-threaded execution most with a linear correlation of $0.37$, whereas boost frequency was the most influential parameter for single-threaded runs (correlation coefficient $0.55$); cache size played no significant role. When comparing our benchmark scores with popular high-performance computing benchmarks, we found a strong linear relationship with Dolfyn ($R^2 = 0.79$) in the single-threaded scenario. Using data from the 3DMark CPU Profile consumer benchmark, we created a linear model for optimal (multi-threaded) Isabelle performance. When validating, the model has an average $R^2$-score of $0.87$; the mean absolute error in the final model corresponds to a wall-clock time of $46.6$s. With a dataset of true median values for the 3DMark, the error improves to $37.1$s.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
Structure in Theorem Proving: Analyzing and Improving the Isabelle Archive of Formal Proofs
Authors:
Fabian Huch
Abstract:
The Isabelle Archive of Formal Proofs has grown to a significant size in the past years. It makes up for an impressive body of research, which enables a number of statistical approaches to various aspects in theorem proving, and has not yet been utilized exhaustively. However, the growing size also poses some challenges to address: Material becomes increasingly harder to find, reusability and ease…
▽ More
The Isabelle Archive of Formal Proofs has grown to a significant size in the past years. It makes up for an impressive body of research, which enables a number of statistical approaches to various aspects in theorem proving, and has not yet been utilized exhaustively. However, the growing size also poses some challenges to address: Material becomes increasingly harder to find, reusability and ease of understanding become more important. This thesis abstract summarizes my research plans on those topics and briefly touches on preliminary results, which indicate that the node in-degree of the dependency graph of the archive follows a scale-free distribution.
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
A Linter for Isabelle: Implementation and Evaluation
Authors:
Yecine Megdiche,
Fabian Huch,
Lukas Stevens
Abstract:
In interactive theorem proving, formalization quality is a key factor for maintainability and re-usability of developments and can also impact proof-checking performance. Commonly, anti-patterns that cause quality issues are known to experienced users. However, in many theorem prover systems, there are no automatic tools to check for their presence and make less experienced users aware of them. We…
▽ More
In interactive theorem proving, formalization quality is a key factor for maintainability and re-usability of developments and can also impact proof-checking performance. Commonly, anti-patterns that cause quality issues are known to experienced users. However, in many theorem prover systems, there are no automatic tools to check for their presence and make less experienced users aware of them. We attempt to fill this gap in the Isabelle environment by developing a linter as a publicly available add-on component. The linter offers basic configurability, extensibility, Isabelle/jEdit integration, and a standalone command-line tool. We uncovered 480 potential problems in Isabelle/HOL, 14016 in other formalizations of the Isabelle distribution, and an astonishing 59573 in the AFP. With a specific lint bundle for AFP submissions, we found that submission guidelines were violated in 1595 cases. We set out to alleviate problems in Isabelle/HOL and solved 168 of them so far; we found that high-severity lints corresponded to actual problems most of the time, individual users often made the same mistakes in many places, and that solving those problems retrospectively amounts to a substantial amount of work. In contrast, solving these problems interactively for new developments usually incurs only little overhead, as we found in a quantitative user survey with 22 participants (less than a minute for more than 60% of participants). We also found that a good explanation of problems is key to the users' ease of solving these problems (correlation coefficient 0.48), and their satisfaction with the end result (correlation coefficient 0.62).
△ Less
Submitted 21 July, 2022;
originally announced July 2022.
-
FindFacts: A Scalable Theorem Search
Authors:
Fabian Huch,
Alexander Krauss
Abstract:
The Isabelle Archive of Formal Proofs (AFP) has grown to over 500 articles in late 2019. Meanwhile, finding formalizations in it has not exactly become easier. At the time of writing, the site-specific AFP google search and the Isabelle find_theories resp. find_consts commands (that only work on imported theories) are still the only tools readily available to find formalizations in Isabelle. We pr…
▽ More
The Isabelle Archive of Formal Proofs (AFP) has grown to over 500 articles in late 2019. Meanwhile, finding formalizations in it has not exactly become easier. At the time of writing, the site-specific AFP google search and the Isabelle find_theories resp. find_consts commands (that only work on imported theories) are still the only tools readily available to find formalizations in Isabelle. We present FindFacts, a novel domain-specific search tool for formal Isabelle theory content. Instead of utilizing term unification, we solve the problem with a classical drill-down search engine. We put special emphasis on scalability of the search system, so that the whole AFP can be searched interactively.
△ Less
Submitted 27 April, 2022;
originally announced April 2022.