-
WaterFowl, a Compact, Self-indexed RDF Store with Inference-enabled Dictionaries
Authors:
Olivier Curé,
Guillaume Blin,
Dominique Revuz,
David Faye
Abstract:
In this paper, we present a novel approach -- called WaterFowl -- for the storage of RDF triples that addresses some key issues in the contexts of big data and the Semantic Web. The architecture of our prototype, largely based on the use of succinct data structures, enables the representation of triples in a self-indexed, compact manner without requiring decompression at query answering time. More…
▽ More
In this paper, we present a novel approach -- called WaterFowl -- for the storage of RDF triples that addresses some key issues in the contexts of big data and the Semantic Web. The architecture of our prototype, largely based on the use of succinct data structures, enables the representation of triples in a self-indexed, compact manner without requiring decompression at query answering time. Moreover, it is adapted to efficiently support RDF and RDFS entailment regimes thanks to an optimized encoding of ontology concepts and properties that does not require a complete inference materialization or extensive query rewriting algorithms. This approach implies to make a distinction between the terminological and the assertional components of the knowledge base early in the process of data preparation, i.e., preprocessing the data before storing it in our structures. The paper describes the complete architecture of this system and presents some preliminary results obtained from evaluations conducted on our first prototype.
△ Less
Submitted 20 January, 2014;
originally announced January 2014.
-
Towards a better insight of RDF triples Ontology-guided Storage system abilities
Authors:
Olivier Curé,
David Faye,
Guillaume Blin
Abstract:
The vision of the Semantic Web is becoming a reality with billions of RDF triples being distributed over multiple queryable end-points (e.g. Linked Data). Although there has been a body of work on RDF triples persistent storage, it seems that, considering reasoning dependent queries, the problem of providing an efficient, in terms of performance, scalability and data redundancy, partitioning of th…
▽ More
The vision of the Semantic Web is becoming a reality with billions of RDF triples being distributed over multiple queryable end-points (e.g. Linked Data). Although there has been a body of work on RDF triples persistent storage, it seems that, considering reasoning dependent queries, the problem of providing an efficient, in terms of performance, scalability and data redundancy, partitioning of the data is still open. In regards to recent data partitioning studies, it seems reasonable to think that data partitioning should be guided considering several directions (e.g. ontology, data, application queries). This paper proposes several contributions: describe an overview of what a road map for data partitioning for RDF data efficient and persistent storage should contain, present some preliminary results and analysis on the particular case of ontology-guided (property hierarchy) partitioning and finally introduce a set of semantic query rewriting rules to support querying RDF data needing OWL inferences
△ Less
Submitted 27 June, 2013;
originally announced June 2013.
-
Comparing RNA structures using a full set of biologically relevant edit operations is intractable
Authors:
Guillaume Blin,
Sylvie Hamel,
Stéphane Vialette
Abstract:
Arc-annotated sequences are useful for representing structural information of RNAs and have been extensively used for comparing RNA structures in both terms of sequence and structural similarities. Among the many paradigms referring to arc-annotated sequences and RNA structures comparison (see \cite{IGMA_BliDenDul08} for more details), the most important one is the general edit distance. The pro…
▽ More
Arc-annotated sequences are useful for representing structural information of RNAs and have been extensively used for comparing RNA structures in both terms of sequence and structural similarities. Among the many paradigms referring to arc-annotated sequences and RNA structures comparison (see \cite{IGMA_BliDenDul08} for more details), the most important one is the general edit distance. The problem of computing an edit distance between two non-crossing arc-annotated sequences was introduced in \cite{Evans99}. The introduced model uses edit operations that involve either single letters or pairs of letters (never considered separately) and is solvable in polynomial-time \cite{ZhangShasha:1989}. To account for other possible RNA structural evolutionary events, new edit operations, allowing to consider either silmutaneously or separately letters of a pair were introduced in \cite{jiangli}; unfortunately at the cost of computational tractability. It has been proved that comparing two RNA secondary structures using a full set of biologically relevant edit operations is {\sf\bf NP}-complete. Nevertheless, in \cite{DBLP:conf/spire/GuignonCH05}, the authors have used a strong combinatorial restriction in order to compare two RNA stem-loops with a full set of biologically relevant edit operations; which have allowed them to design a polynomial-time and space algorithm for comparing general secondary RNA structures. In this paper we will prove theoretically that comparing two RNA structures using a full set of biologically relevant edit operations cannot be done without strong combinatorial restrictions.
△ Less
Submitted 20 December, 2008;
originally announced December 2008.