-
Towards Automatic Error Recovery in Parsing Expression
Authors:
Sérgio Queiroz de Medeiros,
Fabio Mascarenhas
Abstract:
Error recovery is an essential feature for a parser that should be plugged in Integrated Development Environments (IDEs), which must build Abstract Syntax Trees (ASTs) even for syntactically invalid programs in order to offer features such as automated refactoring and code completion.
Parsing Expressions Grammars (PEGs) are a formalism that naturally describes recursive top-down parsers using a…
▽ More
Error recovery is an essential feature for a parser that should be plugged in Integrated Development Environments (IDEs), which must build Abstract Syntax Trees (ASTs) even for syntactically invalid programs in order to offer features such as automated refactoring and code completion.
Parsing Expressions Grammars (PEGs) are a formalism that naturally describes recursive top-down parsers using a restricted form of backtracking. Labeled failures are a conservative extension of PEGs that adds an error reporting mechanism for PEG parsers, and these labels can also be associated with recovery expressions to also be an error recovery mechanism. These expressions can use the full expressivity of PEGs to recover from syntactic errors.
Manually annotating a large grammar with labels and recovery expressions can be difficult. In this work, we present an algorithm that automatically annotates a PEG with labels, and builds their corresponding recovery expressions. We evaluate this algorithm by adding error recovery to the parser of the Titan programming language. The results shown that with a small amount of manual intervention our algorithm can be used to produce error recovering parsers for PEGs where most of the alternatives are disjoint.
△ Less
Submitted 4 July, 2025;
originally announced July 2025.
-
Solving systems of inequalities in two variables with floating point arithmetic
Authors:
Walter F. Mascarenhas
Abstract:
From a theoretical point of view, finding the solution set of a system of inequalities in only two variables is easy. However, if we want to get rigorous bounds on this set with floating point arithmetic, in all possible cases, then things are not so simple due to rounding errors. In this article we describe in detail an efficient data structure to represent this solution set and an efficient and…
▽ More
From a theoretical point of view, finding the solution set of a system of inequalities in only two variables is easy. However, if we want to get rigorous bounds on this set with floating point arithmetic, in all possible cases, then things are not so simple due to rounding errors. In this article we describe in detail an efficient data structure to represent this solution set and an efficient and robust algorithm to build it using floating point arithmetic. The data structure and the algorithm were developed as a building block for the rigorous solution of relevant practical problems. They were implemented in \texttt{C++} and the code was carefully tested. This code is available as supplementary material to the arxiv version of this article, and it is distributed under the Mozilla Public License 2.0.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
Computing the exact sign of sums of products with floating point arithmetic
Authors:
Walter F. Mascarenhas
Abstract:
IIn computational geometry, the construction of essential primitives like convex hulls, Voronoi diagrams and Delaunay triangulations require the evaluation of the signs of determinants, which are sums of products. The same signs are needed for the exact solution of linear programming problems and systems of linear inequalities. Computing these signs exactly with inexact floating point arithmetic i…
▽ More
IIn computational geometry, the construction of essential primitives like convex hulls, Voronoi diagrams and Delaunay triangulations require the evaluation of the signs of determinants, which are sums of products. The same signs are needed for the exact solution of linear programming problems and systems of linear inequalities. Computing these signs exactly with inexact floating point arithmetic is challenging, and we present yet another algorithm for this task. Our algorithm is efficient and uses only of floating point arithmetic, which is much faster than exact arithmetic. We prove that the algorithm is correct and provide efficient and tested \texttt{C++} code for it.
△ Less
Submitted 17 September, 2021; v1 submitted 16 September, 2021;
originally announced September 2021.
-
Automatic Syntax Error Reporting and Recovery in Parsing Expression Grammars
Authors:
Sérgio Queiroz de Medeiros,
Gilney de Azevedo Alvez Junior,
Fabio Mascarenhas
Abstract:
Error recovery is an essential feature for a parser that should be plugged in Integrated Development Environments (IDEs), which must build Abstract Syntax Trees (ASTs) even for syntactically invalid programs in order to offer features such as automated refactoring and code completion.
Parsing Expressions Grammars (PEGs) are a formalism that naturally describes recursive top-down parsers using a…
▽ More
Error recovery is an essential feature for a parser that should be plugged in Integrated Development Environments (IDEs), which must build Abstract Syntax Trees (ASTs) even for syntactically invalid programs in order to offer features such as automated refactoring and code completion.
Parsing Expressions Grammars (PEGs) are a formalism that naturally describes recursive top-down parsers using a restricted form of backtracking. Labeled failures are a conservative extension of PEGs that adds an error reporting mechanism for PEG parsers, and these labels can also be associated with recovery expressions to provide an error recovery mechanism. These expressions can use the full expressivity of PEGs to recover from syntactic errors.
Manually annotating a large grammar with labels and recovery expressions can be difficult. In this work, we present two approaches, Standard and Unique, to automatically annotate a PEG with labels, and to build their corresponding recovery expressions. The Standard approach annotates a grammar in a way similar to manual annotation, but it may insert labels incorrectly, while the Unique approach is more conservative to annotate a grammar and does not insert labels incorrectly.
We evaluate both approaches by using them to generate error recovering parsers for four programming languages: Titan, C, Pascal and Java. In our evaluation, the parsers produced using the Standard approach, after a manual intervention to remove the labels incorrectly added, gave an acceptable recovery for at least 70% of the files in each language. By it turn, the acceptable recovery rate of the parsers produced via the Unique approach, without the need of manual intervention, ranged from 41% to 76%.
△ Less
Submitted 1 October, 2019; v1 submitted 6 May, 2019;
originally announced May 2019.
-
Syntax Error Recovery in Parsing Expression Grammars
Authors:
Sérgio Medeiros,
Fabio Mascarenhas
Abstract:
Parsing Expression Grammars (PEGs) are a formalism used to describe top-down parsers with backtracking. As PEGs do not provide a good error recovery mechanism, PEG-based parsers usually do not recover from syntax errors in the input, or recover from syntax errors using ad-hoc, implementation-specific features. The lack of proper error recovery makes PEG parsers unsuitable for using with Integrated…
▽ More
Parsing Expression Grammars (PEGs) are a formalism used to describe top-down parsers with backtracking. As PEGs do not provide a good error recovery mechanism, PEG-based parsers usually do not recover from syntax errors in the input, or recover from syntax errors using ad-hoc, implementation-specific features. The lack of proper error recovery makes PEG parsers unsuitable for using with Integrated Development Environments (IDEs), which need to build syntactic trees even for incomplete, syntactically invalid programs.
We propose a conservative extension, based on PEGs with labeled failures, that adds a syntax error recovery mechanism for PEGs. This extension associates recovery expressions to labels, where a label now not only reports a syntax error but also uses this recovery expression to reach a synchronization point in the input and resume parsing. We give an operational semantics of PEGs with this recovery mechanism, and use an implementation based on such semantics to build a robust parser for the Lua language. We evaluate the effectiveness of this parser, alone and in comparison with a Lua parser with automatic error recovery generated by ANTLR, a popular parser generator.
△ Less
Submitted 28 June, 2018;
originally announced June 2018.
-
Moore: Interval Arithmetic in C++20
Authors:
Walter F. Mascarenhas
Abstract:
This article presents the Moore library for interval arithmetic in C++20. It gives examples of how the library can be used, and explains the basic principles underlying its design.
This article presents the Moore library for interval arithmetic in C++20. It gives examples of how the library can be used, and explains the basic principles underlying its design.
△ Less
Submitted 21 February, 2018;
originally announced February 2018.
-
Decoding Lua: Formal Semantics for the Developer and the Semanticist
Authors:
Mallku Soldevila,
Beta Ziliani,
Bruno Silvestre,
Daniel Fridlender,
Fabio Mascarenhas
Abstract:
We provide formal semantics for a large subset of the Lua programming language, in its version 5.2. We validate our model by mechanizing it and testing it against the test suite of the reference interpreter of Lua, confirming that our model accurately represents the language. In addition, we set us an ambitious goal: to target both a PL semanticist ---not necessarily versed in Lua---, and a Lua de…
▽ More
We provide formal semantics for a large subset of the Lua programming language, in its version 5.2. We validate our model by mechanizing it and testing it against the test suite of the reference interpreter of Lua, confirming that our model accurately represents the language. In addition, we set us an ambitious goal: to target both a PL semanticist ---not necessarily versed in Lua---, and a Lua developer ---not necessarily versed in semantic frameworks. To the former, we present the peculiarities of the language, and how we model them in a traditional small-step operational semantics, embedded within Felleisen-Hieb's reduction semantics with evaluation contexts. The mechanization is, naturally, performed in PLT Redex, the de facto tool for mechanizing reduction semantics.
To the reader unfamiliar with such concepts, we provide, to our best possible within the space limitations, a gentle introduction of the model. It is our hope that developers of the different Lua implementations and dialects understand the model and consider it both for testing their work and for experimenting with new language features.
△ Less
Submitted 7 June, 2017;
originally announced June 2017.
-
Moore: Interval Arithmetic in Modern C++
Authors:
Walter F. Mascarenhas
Abstract:
We present the library Moore, which implements Interval Arithmetic in modern C++. This library is based on a new feature in the C++ language called concepts, which reduces the problems caused by template meta programming, and leads to a new approach for implementing interval arithmetic libraries in C++.
We present the library Moore, which implements Interval Arithmetic in modern C++. This library is based on a new feature in the C++ language called concepts, which reduces the problems caused by template meta programming, and leads to a new approach for implementing interval arithmetic libraries in C++.
△ Less
Submitted 29 November, 2016;
originally announced November 2016.
-
Fast and accurate normalization of vectors and quaternions
Authors:
Walter F. Mascarenhas
Abstract:
We present fast and accurate ways to normalize two and three dimensional vectors and quaternions and compute their length. Our approach is an adaptation of ideas used in the linear algebra library LAPACK, and we believe that the computational geometry and computer aided design communities are not aware of the possibility of speeding up these fundamental operations in the robust way proposed here.
We present fast and accurate ways to normalize two and three dimensional vectors and quaternions and compute their length. Our approach is an adaptation of ideas used in the linear algebra library LAPACK, and we believe that the computational geometry and computer aided design communities are not aware of the possibility of speeding up these fundamental operations in the robust way proposed here.
△ Less
Submitted 16 January, 2018; v1 submitted 21 June, 2016;
originally announced June 2016.
-
Error Reporting in Parsing Expression Grammars
Authors:
André Murbach Maidl,
Sérgio Medeiros,
Fabio Mascarenhas,
Roberto Ierusalimschy
Abstract:
Parsing Expression Grammars (PEGs) describe top-down parsers. Unfortunately, the error-reporting techniques used in conventional top-down parsers do not directly apply to parsers based on Parsing Expression Grammars (PEGs), so they have to be somehow simulated. While the PEG formalism has no account of semantic actions, actual PEG implementations add them, and we show how to simulate an error-repo…
▽ More
Parsing Expression Grammars (PEGs) describe top-down parsers. Unfortunately, the error-reporting techniques used in conventional top-down parsers do not directly apply to parsers based on Parsing Expression Grammars (PEGs), so they have to be somehow simulated. While the PEG formalism has no account of semantic actions, actual PEG implementations add them, and we show how to simulate an error-reporting heuristic through these semantic actions.
We also propose a complementary error reporting strategy that may lead to better error messages: labeled failures. This approach is inspired by exception handling of programming languages, and lets a PEG define different kinds of failure, with each ordered choice operator specifying which kinds it catches. Labeled failures give a way to annotate grammars for better error reporting, to express some of the error reporting strategies used by deterministic parser combinators, and to encode predictive top-down parsing in a PEG.
△ Less
Submitted 13 July, 2016; v1 submitted 26 May, 2014;
originally announced May 2014.
-
On the Relation between Context-Free Grammars and Parsing Expression Grammars
Authors:
Fabio Mascarenhas,
Sérgio Medeiros,
Roberto Ierusalimschy
Abstract:
Context-Free Grammars (CFGs) and Parsing Expression Grammars (PEGs) have several similarities and a few differences in both their syntax and semantics, but they are usually presented through formalisms that hinder a proper comparison. In this paper we present a new formalism for CFGs that highlights the similarities and differences between them. The new formalism borrows from PEGs the use of parsi…
▽ More
Context-Free Grammars (CFGs) and Parsing Expression Grammars (PEGs) have several similarities and a few differences in both their syntax and semantics, but they are usually presented through formalisms that hinder a proper comparison. In this paper we present a new formalism for CFGs that highlights the similarities and differences between them. The new formalism borrows from PEGs the use of parsing expressions and the recognition-based semantics. We show how one way of removing non-determinism from this formalism yields a formalism with the semantics of PEGs. We also prove, based on these new formalisms, how LL(1) grammars define the same language whether interpreted as CFGs or as PEGs, and also show how strong-LL(k), right-linear, and LL-regular grammars have simple language-preserving translations from CFGs to PEGs.
△ Less
Submitted 13 February, 2014; v1 submitted 10 April, 2013;
originally announced April 2013.
-
From Regexes to Parsing Expression Grammars
Authors:
Sérgio Medeiros,
Fabio Mascarenhas,
Roberto Ierusalimschy
Abstract:
Most scripting languages nowadays use regex pattern-matching libraries. These regex libraries borrow the syntax of regular expressions, but have an informal semantics that is different from the semantics of regular expressions, removing the commutativity of alternation and adding ad-hoc extensions that cannot be expressed by formalisms for efficient recognition of regular languages, such as determ…
▽ More
Most scripting languages nowadays use regex pattern-matching libraries. These regex libraries borrow the syntax of regular expressions, but have an informal semantics that is different from the semantics of regular expressions, removing the commutativity of alternation and adding ad-hoc extensions that cannot be expressed by formalisms for efficient recognition of regular languages, such as deterministic finite automata.
Parsing Expression Grammars are a formalism that can describe all deterministic context-free languages and has a simple computational model. In this paper, we present a formalization of regexes via transformation to Parsing Expression Grammars. The proposed transformation easily accommodates several of the common regex extensions, giving a formal meaning to them. It also provides a clear computational model that helps to estimate the efficiency of regex-based matchers, and a basis for specifying provably correct optimizations for them.
△ Less
Submitted 17 October, 2012;
originally announced October 2012.
-
Left Recursion in Parsing Expression Grammars
Authors:
Sérgio Medeiros,
Fabio Mascarenhas,
Roberto Ierusalimschy
Abstract:
Parsing Expression Grammars (PEGs) are a formalism that can describe all deterministic context-free languages through a set of rules that specify a top-down parser for some language. PEGs are easy to use, and there are efficient implementations of PEG libraries in several programming languages.
A frequently missed feature of PEGs is left recursion, which is commonly used in Context-Free Grammars…
▽ More
Parsing Expression Grammars (PEGs) are a formalism that can describe all deterministic context-free languages through a set of rules that specify a top-down parser for some language. PEGs are easy to use, and there are efficient implementations of PEG libraries in several programming languages.
A frequently missed feature of PEGs is left recursion, which is commonly used in Context-Free Grammars (CFGs) to encode left-associative operations. We present a simple conservative extension to the semantics of PEGs that gives useful meaning to direct and indirect left-recursive rules, and show that our extensions make it easy to express left-recursive idioms from CFGs in PEGs, with similar results. We prove the conservativeness of these extensions, and also prove that they work with any left-recursive PEG.
PEGs can also be compiled to programs in a low-level parsing machine. We present an extension to the semantics of the operations of this parsing machine that let it interpret left-recursive PEGs, and prove that this extension is correct with regards to our semantics for left-recursive PEGs.
△ Less
Submitted 13 February, 2014; v1 submitted 2 July, 2012;
originally announced July 2012.