gSMat: A Scalable Sparse Matrix-based Join for SPARQL Query Processing

Zhang, Xiaowang; Zhang, Mingyue; Peng, Peng; Song, Jiaming; Feng, Zhiyong; Zou, Lei

Computer Science > Databases

arXiv:1807.07691 (cs)

[Submitted on 20 Jul 2018]

Title:gSMat: A Scalable Sparse Matrix-based Join for SPARQL Query Processing

Authors:Xiaowang Zhang, Mingyue Zhang, Peng Peng, Jiaming Song, Zhiyong Feng, Lei Zou

View PDF

Abstract:Resource Description Framework (RDF) has been widely used to represent information on the web, while SPARQL is a standard query language to manipulate RDF data. Given a SPARQL query, there often exist many joins which are the bottlenecks of efficiency of query processing. Besides, the real RDF datasets often reveal strong data sparsity, which indicates that a resource often only relates to a few resources even the number of total resources is large. In this paper, we propose a sparse matrix-based (SM-based) SPARQL query processing approach over RDF datasets which con- siders both join optimization and data sparsity. Firstly, we present a SM-based storage for RDF datasets to lift the storage efficiency, where valid edges are stored only, and then introduce a predicate- based hash index on the storage. Secondly, we develop a scalable SM-based join algorithm for SPARQL query processing. Finally, we analyze the overall cost by accumulating all intermediate results and design a query plan generated algorithm. Besides, we extend our SM-based join algorithm on GPU for parallelizing SPARQL query processing. We have evaluated our approach compared with the state-of-the-art RDF engines over benchmark RDF datasets and the experimental results show that our proposal can significantly improve SPARQL query processing with high scalability.

Comments:	13 pages
Subjects:	Databases (cs.DB)
ACM classes:	H.2
Cite as:	arXiv:1807.07691 [cs.DB]
	(or arXiv:1807.07691v1 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.1807.07691

Submission history

From: Xiaowang Zhang [view email]
[v1] Fri, 20 Jul 2018 01:52:45 UTC (116 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.DB

< prev | next >

new | recent | 2018-07

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Xiaowang Zhang
Mingyue Zhang
Peng Peng
Jiaming Song
Zhiyong Feng

…

export BibTeX citation

Computer Science > Databases

Title:gSMat: A Scalable Sparse Matrix-based Join for SPARQL Query Processing

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:gSMat: A Scalable Sparse Matrix-based Join for SPARQL Query Processing

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators