-
String Theories involving Regular Membership Predicates: From Practice to Theory and Back
Authors:
Murphy Berzish,
Joel D. Day,
Vijay Ganesh,
Mitja Kulczynski,
Florin Manea,
Federico Mora,
Dirk Nowotka
Abstract:
Widespread use of string solvers in formal analysis of string-heavy programs has led to a growing demand for more efficient and reliable techniques which can be applied in this context, especially for real-world cases. Designing an algorithm for the (generally undecidable) satisfiability problem for systems of string constraints requires a thorough understanding of the structure of constraints pre…
▽ More
Widespread use of string solvers in formal analysis of string-heavy programs has led to a growing demand for more efficient and reliable techniques which can be applied in this context, especially for real-world cases. Designing an algorithm for the (generally undecidable) satisfiability problem for systems of string constraints requires a thorough understanding of the structure of constraints present in the targeted cases. In this paper, we investigate benchmarks presented in the literature containing regular expression membership predicates, extract different first order logic theories, and prove their decidability, resp. undecidability. Notably, the most common theories in real-world benchmarks are PSPACE-complete and directly lead to the implementation of a more efficient algorithm to solving string constraints.
△ Less
Submitted 15 May, 2021;
originally announced May 2021.
-
An SMT Solver for Regular Expressions and Linear Arithmetic over String Length
Authors:
Murphy Berzish,
Mitja Kulczynski,
Federico Mora,
Florin Manea,
Joel D. Day,
Dirk Nowotka,
Vijay Ganesh
Abstract:
We present a novel length-aware solving algorithm for the quantifier-free first-order theory over regex membership predicate and linear arithmetic over string length. We implement and evaluate this algorithm and related heuristics in the Z3 theorem prover. A crucial insight that underpins our algorithm is that real-world instances contain a wealth of information about upper and lower bounds on len…
▽ More
We present a novel length-aware solving algorithm for the quantifier-free first-order theory over regex membership predicate and linear arithmetic over string length. We implement and evaluate this algorithm and related heuristics in the Z3 theorem prover. A crucial insight that underpins our algorithm is that real-world instances contain a wealth of information about upper and lower bounds on lengths of strings under constraints, and such information can be used very effectively to simplify operations on automata representing regular expressions. Additionally, we present a number of novel general heuristics, such as the prefix/suffix method, that can be used in conjunction with a variety of regex solving algorithms, making them more efficient. We showcase the power of our algorithm and heuristics via an extensive empirical evaluation over a large and diverse benchmark of 57256 regex-heavy instances, almost 75% of which are derived from industrial applications or contributed by other solver developers. Our solver outperforms five other state-of-the-art string solvers, namely, CVC4, OSTRICH, Z3seq, Z3str3, and Z3-Trau, over this benchmark, in particular achieving a 2.4x speedup over CVC4, 4.4x speedup over Z3seq, 6.4x speedup over Z3-Trau, 9.1x speedup over Z3str3, and 13x speedup over OSTRICH.
△ Less
Submitted 7 May, 2021; v1 submitted 14 October, 2020;
originally announced October 2020.
-
Z3str3: A String Solver with Theory-aware Branching
Authors:
Murphy Berzish,
Yunhui Zheng,
Vijay Ganesh
Abstract:
We present a new string SMT solver, Z3str3, that is faster than its competitors Z3str2, Norn, CVC4, S3, and S3P over a majority of three industrial-strength benchmarks, namely Kaluza, PISA, and IBM AppScan. Z3str3 supports string equations, linear arithmetic over length function, and regular language membership predicate. The key algorithmic innovation behind the efficiency of Z3str3 is a techniqu…
▽ More
We present a new string SMT solver, Z3str3, that is faster than its competitors Z3str2, Norn, CVC4, S3, and S3P over a majority of three industrial-strength benchmarks, namely Kaluza, PISA, and IBM AppScan. Z3str3 supports string equations, linear arithmetic over length function, and regular language membership predicate. The key algorithmic innovation behind the efficiency of Z3str3 is a technique we call theory-aware branching, wherein we modify Z3's branching heuristic to take into account the structure of theory literals to compute branching activities. In the traditional DPLL(T) architecture, the structure of theory literals is hidden from the DPLL(T) SAT solver because of the Boolean abstraction constructed over the input theory formula. By contrast, the theory-aware technique presented in this paper exposes the structure of theory literals to the DPLL(T) SAT solver's branching heuristic, thus enabling it to make much smarter decisions during its search than otherwise. As a consequence, Z3str3 has better performance than its competitors.
△ Less
Submitted 25 April, 2017;
originally announced April 2017.
-
A Solver for a Theory of Strings and Bit-vectors
Authors:
Sanu Subramanian,
Murphy Berzish,
Yunhui Zheng,
Omer Tripp,
Vijay Ganesh
Abstract:
We present a solver for a many-sorted first-order quantifier-free theory $T_{w,bv}$ of string equations, string length represented as bit-vectors, and bit-vector arithmetic aimed at formal verification, automated testing, and security analysis of C/C++ applications. Our key motivation for building such a solver is the observation that existing string solvers are not efficient at modeling the strin…
▽ More
We present a solver for a many-sorted first-order quantifier-free theory $T_{w,bv}$ of string equations, string length represented as bit-vectors, and bit-vector arithmetic aimed at formal verification, automated testing, and security analysis of C/C++ applications. Our key motivation for building such a solver is the observation that existing string solvers are not efficient at modeling the string/bit-vector combination. Current approaches either reduce strings to bit-vectors and use a bit-vector solver as a backend, or model bit-vectors as natural numbers and use a solver for the combined theory of strings and natural numbers. Both these approaches are inefficient for different reasons. Modeling strings as bit-vectors destroys structure inherent in string equations thus missing opportunities for efficiently deciding such formulas, and modeling bit-vectors as natural numbers is known to be inefficient. Hence, there is a clear need for a solver that models strings and bit-vectors natively.
Our solver Z3strBV is a decision procedure for the theory $T_{w,bv}$ combining solvers for bit-vector and string equations. We demonstrate experimentally that Z3strBV is significantly more efficient than reduction of string/bit-vector constraints to strings/natural numbers. Additionally, we prove decidability for the theory $T_{w,bv}$. We also propose two optimizations which can be adapted to other contexts. The first accelerates convergence on a consistent assignment of string lengths, and the second, dubbed library-aware SMT solving, fixes summaries for built-in string functions (e.g., {\tt strlen} in C/C++), which Z3strBV uses directly instead of analyzing the functions from scratch each time. Finally, we demonstrate experimentally that Z3strBV is able to detect nontrivial overflows in real-world system-level code, as confirmed against 7 security vulnerabilities from CVE and Mozilla database.
△ Less
Submitted 30 May, 2016;
originally announced May 2016.
-
Undecidability of a Theory of Strings, Linear Arithmetic over Length, and String-Number Conversion
Authors:
Vijay Ganesh,
Murphy Berzish
Abstract:
In recent years there has been considerable interest in theories over string equations, length function, and string-number conversion predicate within the formal verification, software engineering, and security communities. SMT solvers for these theories, such as Z3str2, CVC4, and S3, are of immense practical value in exposing security vulnerabilities in string-intensive programs. Additionally, th…
▽ More
In recent years there has been considerable interest in theories over string equations, length function, and string-number conversion predicate within the formal verification, software engineering, and security communities. SMT solvers for these theories, such as Z3str2, CVC4, and S3, are of immense practical value in exposing security vulnerabilities in string-intensive programs. Additionally, there are many open decidability and complexity-theoretic questions in the context of theories over strings that are of great interest to mathematicians. Motivated by the above-mentioned applications and open questions, we study a first-order, many-sorted, quantifier-free theory $T_{s,n}$ of string equations, linear arithmetic over string length, and string-number conversion predicate and prove three theorems. First, we prove that the satisfiability problem for the theory $T_{s,n}$ is undecidable via a reduction from a theory of linear arithmetic over natural numbers with power predicate, we call power arithmetic. Second, we show that the string-numeric conversion predicate is expressible in terms of the power predicate, string equations, and length function. This second theorem, in conjunction with the reduction we propose for the undecidability theorem, suggests that the power predicate is expressible in terms of word equations and length function if and only if the string-numeric conversion predicate is also expressible in the same fragment. Such results are very useful tools in comparing the expressive power of different theories, and for establishing decidability and complexity results. Third, we provide a consistent axiomatization $Γ$ for the functions and predicates of $T_{s,n}$. Additionally, we prove that the theory $T_Γ$ , obtained via logical closure of $Γ$, is not a complete theory.
△ Less
Submitted 26 October, 2016; v1 submitted 30 May, 2016;
originally announced May 2016.