-
Elimination of annotation dependencies in validation for Modern JSON Schema
Authors:
Lyes Attouche,
Mohamed-Amine Baazizi,
Dario Colazzo,
Giorgio Ghelli,
Stefan Klessinger,
Carlo Sartiani,
Stefanie Scherzinger
Abstract:
JSON Schema is a logical language used to define the structure of JSON values. JSON Schema syntax is based on nested schema objects. In all versions of JSON Schema until Draft-07, collectively known as Classical JSON Schema, the semantics of a schema was entirely described by the set of JSON values that it validates. This semantics was the basis for a thorough theoretical study and for the develop…
▽ More
JSON Schema is a logical language used to define the structure of JSON values. JSON Schema syntax is based on nested schema objects. In all versions of JSON Schema until Draft-07, collectively known as Classical JSON Schema, the semantics of a schema was entirely described by the set of JSON values that it validates. This semantics was the basis for a thorough theoretical study and for the development of tools to decide satisfiability and equivalence of schemas. Unfortunately, Classical JSON Schema suffered a severe limitation in its ability to express extensions of object schemas, which caused the introduction, with Draft 2019-09, of two disruptive features: annotation dependency and dynamic references.
These new features undermine the previously developed semantic theory, and the algorithms used to decide satisfiability for Classical JSON Schema are not easy to extend. One possible solution is rewriting a schema written in Modern JSON Schema into an equivalent schema in Classical JSON Schema.
In this paper we prove that the elimination of annotation dependent keywords cannot, in general, avoid an exponential increase of the schema dimension. We provide an algorithm to eliminate these keywords that, despite the theoretical lower bound, behaves quite well in practice, as we verify with an extensive set of experiments.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Validation of Modern JSON Schema: Formalization and Complexity
Authors:
Lyes Attouche,
Mohamed-Amine Baazizi,
Dario Colazzo,
Giorgio Ghelli,
Carlo Sartiani,
Stefanie Scherzinger
Abstract:
JSON Schema is the de-facto standard schema language for JSON data. The language went through many minor revisions, but the most recent versions of the language added two novel features, dynamic references and annotation-dependent validation, that change the evaluation model. Modern JSON Schema is the name used to indicate all versions from Draft 2019-09, which are characterized by these new featu…
▽ More
JSON Schema is the de-facto standard schema language for JSON data. The language went through many minor revisions, but the most recent versions of the language added two novel features, dynamic references and annotation-dependent validation, that change the evaluation model. Modern JSON Schema is the name used to indicate all versions from Draft 2019-09, which are characterized by these new features, while Classical JSON Schema is used to indicate the previous versions.
These new "modern" features make the schema language quite difficult to understand, and have generated many discussions about the correct interpretation of their official specifications; for this reason we undertook the task of their formalization. During this process, we also analyzed the complexity of data validation in Modern JSON Schema, with the idea of confirming the PTIME complexity of Classical JSON Schema validation, and we were surprised to discover a completely different truth: data validation, that is expected to be an extremely efficient process, acquires, with Modern JSON Schema features, a PSPACE complexity.
In this paper, we give the first formal description of Modern JSON Schema, which we consider a central contribution of the work that we present here. We then prove that its data validation problem is PSPACE-complete. We prove that the origin of the problem lies in dynamic references, and not in annotation-dependent validation. We study the schema and data complexities, showing that the problem is PSPACE-complete with respect to the schema size even with a fixed instance, but is in PTIME when the schema is fixed and only the instance size is allowed to vary. Finally, we run experiments that show that there are families of schemas where the difference in asymptotic complexity between dynamic and static references is extremely visible, even with small schemas.
△ Less
Submitted 1 February, 2024; v1 submitted 19 July, 2023;
originally announced July 2023.
-
Negation-Closure for JSON Schema
Authors:
Mohamed-Amine Baazizi,
Dario Colazzo,
Giorgio Ghelli,
Carlo Sartiani,
Stefanie Scherzinger
Abstract:
JSON Schema is an evolving standard for describing families of JSON documents. It is a logical language, based on a set of assertions that describe features of the JSON value under analysis and on logical or structural combinators for these assertions, including a negation operator. Most logical languages with negation enjoy negation closure, that is, for every operator they have a negation dual t…
▽ More
JSON Schema is an evolving standard for describing families of JSON documents. It is a logical language, based on a set of assertions that describe features of the JSON value under analysis and on logical or structural combinators for these assertions, including a negation operator. Most logical languages with negation enjoy negation closure, that is, for every operator they have a negation dual that expresses its negation. We show that this is not the case for JSON Schema, we study how that changed with the latest versions of the Draft, and we discuss how the language may be enriched accordingly. In the process, we define an algebraic reformulation of JSON Schema, which we successfully employed in a prototype system for generating schema witnesses.
△ Less
Submitted 27 February, 2022;
originally announced February 2022.
-
Witness Generation for JSON Schema
Authors:
Lyes Attouche,
Mohamed-Amine Baazizi,
Dario Colazzo,
Giorgio Ghelli,
Carlo Sartiani,
Stefanie Scherzinger
Abstract:
JSON Schema is an important, evolving standard schema language for families of JSON documents. It is based on a complex combination of structural and Boolean assertions, and features negation and recursion. The static analysis of JSON Schema documents comprises practically relevant problems, including schema satisfiability, inclusion, and equivalence. These three problems can be reduced to witness…
▽ More
JSON Schema is an important, evolving standard schema language for families of JSON documents. It is based on a complex combination of structural and Boolean assertions, and features negation and recursion. The static analysis of JSON Schema documents comprises practically relevant problems, including schema satisfiability, inclusion, and equivalence. These three problems can be reduced to witness generation: given a schema, generate an element of the schema, if it exists, and report failure otherwise. Schema satisfiability, inclusion, and equivalence have been shown to be decidable, by reduction to reachability in alternating tree automata. However, no witness generation algorithm has yet been formally described. We contribute a first, direct algorithm for JSON Schema witness generation. We study its effectiveness and efficiency, in experiments over several schema collections, including thousands of real-world schemas. Our focus is on the completeness of the language, where we only exclude the uniqueItems operator, and on the ability of the algorithm to run in a reasonable time on a large set of real-world examples, despite the exponential complexity of the underlying problem.
△ Less
Submitted 16 July, 2022; v1 submitted 25 February, 2022;
originally announced February 2022.
-
An Empirical Study on the "Usage of Not" in Real-World JSON Schema Documents (Long Version)
Authors:
Mohamed-Amine Baazizi,
Dario Colazzo,
Giorgio Ghelli,
Carlo Sartiani,
Stefanie Scherzinger
Abstract:
In this paper, we study the usage of negation in JSON Schema data modeling. Negation is a logical operator that is rarely present in type systems and schema description languages, since it complicates decision problems. As a consequence, many software tools, but also formal frameworks for working with JSON Schema, do not fully support negation. As of today, the question whether covering negation i…
▽ More
In this paper, we study the usage of negation in JSON Schema data modeling. Negation is a logical operator that is rarely present in type systems and schema description languages, since it complicates decision problems. As a consequence, many software tools, but also formal frameworks for working with JSON Schema, do not fully support negation. As of today, the question whether covering negation is practically relevant, or a mainly theoretical exercise (albeit challenging), is open. This motivates us to study whether negation is really used in practice, for which aims, and whether it could be - in principle - replaced by simpler operators. We have collected the most diverse corpus of JSON Schema documents analyzed so far, based on a crawl of 90k open source schemas hosted on GitHub. We perform a systematic analysis, quantify usage patterns of negation, and also qualitatively analyze schemas. We show that negation is indeed used, following a stable set of patterns, with the potential to mature into design patterns.
△ Less
Submitted 19 July, 2021;
originally announced July 2021.
-
Not Elimination and Witness Generation for JSON Schema
Authors:
Mohamed-Amine Baazizi,
Dario Colazzo,
Giorgio Ghelli,
Carlo Sartiani,
Stefanie Scherzinger
Abstract:
JSON Schema is an evolving standard for the description of families of JSON documents. JSON Schema is a logical language, based on a set of assertions that describe features of the JSON value under analysis and on logical or structural combinators for these assertions. As for any logical language, problems like satisfaction, not-elimination, schema satisfiability, schema inclusion and equivalence,…
▽ More
JSON Schema is an evolving standard for the description of families of JSON documents. JSON Schema is a logical language, based on a set of assertions that describe features of the JSON value under analysis and on logical or structural combinators for these assertions. As for any logical language, problems like satisfaction, not-elimination, schema satisfiability, schema inclusion and equivalence, as well as witness generation, have both theoretical and practical interest. While satisfaction is trivial, all other problems are quite difficult, due to the combined presence of negation, recursion, and complex assertions in JSON Schema. To make things even more complex and interesting, JSON Schema is not algebraic, since we have both syntactic and semantic interactions between different keywords in the same schema object.
With such motivations, we present in this paper an algebraic characterization of JSON Schema, obtained by adding opportune operators, and by mirroring existing ones. We present then algebra-based approaches for dealing with not-elimination and witness generation problems, which play a central role as they lead to solutions for the other mentioned complex problems.
△ Less
Submitted 7 May, 2021; v1 submitted 30 April, 2021;
originally announced April 2021.
-
Typing Regular Path Query Languages for Data Graphs
Authors:
Dario Colazzo,
Carlo Sartiani
Abstract:
Regular path query languages for data graphs are essentially \emph{untyped}. The lack of type information greatly limits the optimization opportunities for query engines and makes application development more complex. In this paper we discuss a simple, yet expressive, schema language for edge-labelled data graphs. This schema language is, then, used to define a query type inference approach with g…
▽ More
Regular path query languages for data graphs are essentially \emph{untyped}. The lack of type information greatly limits the optimization opportunities for query engines and makes application development more complex. In this paper we discuss a simple, yet expressive, schema language for edge-labelled data graphs. This schema language is, then, used to define a query type inference approach with good precision properties.
△ Less
Submitted 7 July, 2015;
originally announced July 2015.