-
Two-layer Space-oriented Partitioning for Non-point Data
Authors:
Dimitrios Tsitsigkos,
Panagiotis Bouros,
Konstantinos Lampropoulos,
Nikos Mamoulis,
Manolis Terrovitis
Abstract:
Non-point spatial objects (e.g., polygons, linestrings, etc.) are ubiquitous. We study the problem of indexing non-point objects in memory for range queries and spatial intersection joins. We propose a secondary partitioning technique for space-oriented partitioning indices (e.g., grids), which improves their performance significantly, by avoiding the generation and elimination of duplicate result…
▽ More
Non-point spatial objects (e.g., polygons, linestrings, etc.) are ubiquitous. We study the problem of indexing non-point objects in memory for range queries and spatial intersection joins. We propose a secondary partitioning technique for space-oriented partitioning indices (e.g., grids), which improves their performance significantly, by avoiding the generation and elimination of duplicate results. Our approach is easy to implement and can be used by any space-partitioning index to significantly reduce the cost of range queries and intersection joins. In addition, the secondary partitions can be processed independently, which makes our method appropriate for distributed and parallel indexing. Experiments on real datasets confirm the advantage of our approach against alternative duplicate elimination techniques and data-oriented state-of-the-art spatial indices. We also show that our partitioning technique, paired with optimized partition-to-partition join algorithms, typically reduces the cost of spatial joins by around 50%.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
HINT: A Hierarchical Index for Intervals in Main Memory
Authors:
George Christodoulou,
Panagiotis Bouros,
Nikos Mamoulis
Abstract:
Indexing intervals is a fundamental problem, finding a wide range of applications. Recent work on managing large collections of intervals in main memory focused on overlap joins and temporal aggregation problems. In this paper, we propose novel and efficient in-memory indexing techniques for intervals, with a focus on interval range queries, which are a basic component of many search and analysis…
▽ More
Indexing intervals is a fundamental problem, finding a wide range of applications. Recent work on managing large collections of intervals in main memory focused on overlap joins and temporal aggregation problems. In this paper, we propose novel and efficient in-memory indexing techniques for intervals, with a focus on interval range queries, which are a basic component of many search and analysis tasks. First, we propose an optimized version of a single-level (flat) domain-partitioning approach, which may have large space requirements due to excessive replication. Then, we propose a hierarchical partitioning approach, which assigns each interval to at most two partitions per level and has controlled space requirements. Novel elements of our techniques include the division of the intervals at each partition into groups based on whether they begin inside or before the partition boundaries, reducing the information stored at each partition to the absolutely necessary, and the effective handling of data sparsity and skew. Experimental results on real and synthetic interval sets of different characteristics show that our approaches are typically one order of magnitude faster than the state-of-the-art.
△ Less
Submitted 7 March, 2022; v1 submitted 22 April, 2021;
originally announced April 2021.
-
A Two-level Spatial In-Memory Index
Authors:
Dimitrios Tsitsigkos,
Konstantinos Lampropoulos,
Panagiotis Bouros,
Nikos Mamoulis,
Manolis Terrovitis
Abstract:
Very large volumes of spatial data increasingly become available and demand effective management. While there has been decades of research on spatial data management, few works consider the current state of commodity hardware, having relatively large memory and the ability of parallel multi-core processing. In this work, we re-consider the design of spatial indexing under this new reality. Specifi…
▽ More
Very large volumes of spatial data increasingly become available and demand effective management. While there has been decades of research on spatial data management, few works consider the current state of commodity hardware, having relatively large memory and the ability of parallel multi-core processing. In this work, we re-consider the design of spatial indexing under this new reality. Specifically, we propose a main-memory indexing approach for objects with spatial extent, which is based on a classic regular space partitioning into disjoint tiles. The novelty of our index is that the contents of each tile are further partitioned into four classes. This second-level partitioning not only reduces the number of comparisons required to compute the results, but also avoids the generation and elimination of duplicate results, which is an inherent problem of spatial indexes based on disjoint space partitioning. The spatial partitions defined by our indexing scheme are totally independent, facilitating effortless parallel evaluation, as no synchronization or communication between the partitions is necessary. We show how our index can be used to efficiently process spatial range queries and drastically reduce the cost of the refinement step of the queries. In addition, we study the efficient processing of numerous range queries in batch and in parallel. Extensive experiments on real datasets confirm the efficiency of our approaches.
△ Less
Submitted 23 February, 2021; v1 submitted 18 May, 2020;
originally announced May 2020.
-
Parallel In-Memory Evaluation of Spatial Joins
Authors:
Dimitrios Tsitsigkos,
Panagiotis Bouros,
Nikos Mamoulis,
Manolis Terrovitis
Abstract:
The spatial join is a popular operation in spatial database systems and its evaluation is a well-studied problem. As main memories become bigger and faster and commodity hardware supports parallel processing, there is a need to revamp classic join algorithms which have been designed for I/O-bound processing. In view of this, we study the in-memory and parallel evaluation of spatial joins, by re-de…
▽ More
The spatial join is a popular operation in spatial database systems and its evaluation is a well-studied problem. As main memories become bigger and faster and commodity hardware supports parallel processing, there is a need to revamp classic join algorithms which have been designed for I/O-bound processing. In view of this, we study the in-memory and parallel evaluation of spatial joins, by re-designing a classic partitioning-based algorithm to consider alternative approaches for space partitioning. Our study shows that, compared to a straightforward implementation of the algorithm, our tuning can improve performance significantly. We also show how to select appropriate partitioning parameters based on data statistics, in order to tune the algorithm for the given join inputs. Our parallel implementation scales gracefully with the number of threads reducing the cost of the join to at most one second even for join inputs with tens of millions of rectangles.
△ Less
Submitted 9 October, 2019; v1 submitted 30 August, 2019;
originally announced August 2019.
-
Finding k-Dissimilar Paths with Minimum Collective Length
Authors:
Theodoros Chondrogiannis,
Panagiotis Bouros,
Johann Gamper,
Ulf Leser,
David B. Blumenthal
Abstract:
Shortest path computation is a fundamental problem in road networks. However, in many real-world scenarios, determining solely the shortest path is not enough. In this paper, we study the problem of finding k-Dissimilar Paths with Minimum Collective Length (kDPwML), which aims at computing a set of paths from a source s to a target t such that all paths are pairwise dissimilar by at least θand the…
▽ More
Shortest path computation is a fundamental problem in road networks. However, in many real-world scenarios, determining solely the shortest path is not enough. In this paper, we study the problem of finding k-Dissimilar Paths with Minimum Collective Length (kDPwML), which aims at computing a set of paths from a source s to a target t such that all paths are pairwise dissimilar by at least θand the sum of the path lengths is minimal. We introduce an exact algorithm for the kDPwML problem, which iterates over all possible s-t paths while employing two pruning techniques to reduce the prohibitively expensive computational cost. To achieve scalability, we also define the much smaller set of the simple single-via paths, and we adapt two algorithms for kDPwML queries to iterate over this set. Our experimental analysis on real road networks shows that iterating over all paths is impractical, while iterating over the set of simple single-via paths can lead to scalable solutions with only a small trade-off in the quality of the results.
△ Less
Submitted 24 October, 2018; v1 submitted 18 September, 2018;
originally announced September 2018.
-
Set Containment Join Revisited
Authors:
Panagiotis Bouros,
Nikos Mamoulis,
Shen Ge,
Manolis Terrovitis
Abstract:
Given two collections of set objects $R$ and $S$, the $R \bowtie_{\subseteq} S$ set containment join returns all object pairs $(r, s) \in R \times S$ such that $r \subseteq s$. Besides being a basic operator in all modern data management systems with a wide range of applications, the join can be used to evaluate complex SQL queries based on relational division and as a module of data mining algori…
▽ More
Given two collections of set objects $R$ and $S$, the $R \bowtie_{\subseteq} S$ set containment join returns all object pairs $(r, s) \in R \times S$ such that $r \subseteq s$. Besides being a basic operator in all modern data management systems with a wide range of applications, the join can be used to evaluate complex SQL queries based on relational division and as a module of data mining algorithms. The state-of-the-art algorithm for set containment joins (PRETTI) builds an inverted index on the right-hand collection $S$ and a prefix tree on the left-hand collection $R$ that groups set objects with common prefixes and thus, avoids redundant processing. In this paper, we present a framework which improves PRETTI in two directions. First, we limit the prefix tree construction by proposing an adaptive methodology based on a cost model; this way, we can greatly reduce the space and time cost of the join. Second, we partition the objects of each collection based on their first contained item, assuming that the set objects are internally sorted. We show that we can process the partitions and evaluate the join while building the prefix tree and the inverted index progressively. This allows us to significantly reduce not only the join cost, but also the maximum memory requirements during the join. An experimental evaluation using both real and synthetic datasets shows that our framework outperforms PRETTI by a wide margin.
△ Less
Submitted 17 March, 2016;
originally announced March 2016.
-
Routing Directions: Keeping it Fast and Simple
Authors:
Dimitris Sacharidis,
Panagiotis Bouros
Abstract:
The problem of providing meaningful routing directions over road networks is of great importance. In many real-life cases, the fastest route may not be the ideal choice for providing directions in written, spoken text, or for an unfamiliar neighborhood, or in cases of emergency. Rather, it is often more preferable to offer "simple" directions that are easy to memorize, explain, understand or follo…
▽ More
The problem of providing meaningful routing directions over road networks is of great importance. In many real-life cases, the fastest route may not be the ideal choice for providing directions in written, spoken text, or for an unfamiliar neighborhood, or in cases of emergency. Rather, it is often more preferable to offer "simple" directions that are easy to memorize, explain, understand or follow. However, there exist cases where the simplest route is considerably longer than the fastest. This paper tries to address this issue, by finding near-simplest routes which are as short as possible and near-fastest routes which are as simple as possible. Particularly, we focus on efficiency, and propose novel algorithms, which are theoretically and experimentally shown to be significantly faster than existing approaches.
△ Less
Submitted 17 September, 2013;
originally announced September 2013.