HPTMT: Operator-Based Architecture for Scalable High-Performance Data-Intensive Frameworks

Kamburugamuve, Supun; Widanage, Chathura; Perera, Niranda; Abeykoon, Vibhatha; Uyar, Ahmet; Kanewala, Thejaka Amila; von Laszewski, Gregor; Fox, Geoffrey

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2107.12807 (cs)

[Submitted on 27 Jul 2021 (v1), last revised 30 Jul 2021 (this version, v2)]

Title:HPTMT: Operator-Based Architecture for Scalable High-Performance Data-Intensive Frameworks

Authors:Supun Kamburugamuve, Chathura Widanage, Niranda Perera, Vibhatha Abeykoon, Ahmet Uyar, Thejaka Amila Kanewala, Gregor von Laszewski, Geoffrey Fox

View PDF

Abstract:Data-intensive applications impact many domains, and their steadily increasing size and complexity demands high-performance, highly usable environments. We integrate a set of ideas developed in various data science and data engineering frameworks. They employ a set of operators on specific data abstractions that include vectors, matrices, tensors, graphs, and tables. Our key concepts are inspired from systems like MPI, HPF (High-Performance Fortran), NumPy, Pandas, Spark, Modin, PyTorch, TensorFlow, RAPIDS(NVIDIA), and OneAPI (Intel). Further, it is crucial to support different languages in everyday use in the Big Data arena, including Python, R, C++, and Java. We note the importance of Apache Arrow and Parquet for enabling language agnostic high performance and interoperability. In this paper, we propose High-Performance Tensors, Matrices and Tables (HPTMT), an operator-based architecture for data-intensive applications, and identify the fundamental principles needed for performance and usability success. We illustrate these principles by a discussion of examples using our software environments, Cylon and Twister2 that embody HPTMT.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2107.12807 [cs.DC]
	(or arXiv:2107.12807v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2107.12807

Submission history

From: Supun Kamburugamuve [view email]
[v1] Tue, 27 Jul 2021 13:28:34 UTC (577 KB)
[v2] Fri, 30 Jul 2021 01:12:23 UTC (577 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:HPTMT: Operator-Based Architecture for Scalable High-Performance Data-Intensive Frameworks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:HPTMT: Operator-Based Architecture for Scalable High-Performance Data-Intensive Frameworks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators