RSATree: Distribution-Aware Data Representation of Large-Scale Tabular Datasets for Flexible Visual Query

Mei, Honghui; Chen, Wei; Wei, Yating; Hu, Yuanzhe; Zhou, Shuyue; Lin, Bingru; Zhao, Ying; Xia, Jiazhi

doi:10.1109/TVCG.2019.2934800

Computer Science > Databases

arXiv:1908.02005 (cs)

[Submitted on 6 Aug 2019 (v1), last revised 11 Oct 2019 (this version, v2)]

Title:RSATree: Distribution-Aware Data Representation of Large-Scale Tabular Datasets for Flexible Visual Query

Authors:Honghui Mei, Wei Chen, Yating Wei, Yuanzhe Hu, Shuyue Zhou, Bingru Lin, Ying Zhao, Jiazhi Xia

View PDF

Abstract:Analysts commonly investigate the data distributions derived from statistical aggregations of data that are represented by charts, such as histograms and binned scatterplots, to visualize and analyze a large-scale dataset. Aggregate queries are implicitly executed through such a process. Datasets are constantly extremely large; thus, the response time should be accelerated by calculating predefined data cubes. However, the queries are limited to the predefined binning schema of preprocessed data cubes. Such limitation hinders analysts' flexible adjustment of visual specifications to investigate the implicit patterns in the data effectively. Particularly, RSATree enables arbitrary queries and flexible binning strategies by leveraging three schemes, namely, an R-tree-based space partitioning scheme to catch the data distribution, a locality-sensitive hashing technique to achieve locality-preserving random access to data items, and a summed area table scheme to support interactive query of aggregated values with a linear computational complexity. This study presents and implements a web-based visual query system that supports visual specification, query, and exploration of large-scale tabular data with user-adjustable granularities. We demonstrate the efficiency and utility of our approach by performing various experiments on real-world datasets and analyzing time and space complexity.

Comments:	VIS 2019 (InfoVis) accepted
Subjects:	Databases (cs.DB); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:1908.02005 [cs.DB]
	(or arXiv:1908.02005v2 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.1908.02005
Related DOI:	https://doi.org/10.1109/TVCG.2019.2934800

Submission history

From: Honghui Mei Mr. [view email]
[v1] Tue, 6 Aug 2019 08:17:43 UTC (2,636 KB)
[v2] Fri, 11 Oct 2019 01:17:08 UTC (2,386 KB)

Computer Science > Databases

Title:RSATree: Distribution-Aware Data Representation of Large-Scale Tabular Datasets for Flexible Visual Query

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:RSATree: Distribution-Aware Data Representation of Large-Scale Tabular Datasets for Flexible Visual Query

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators