-
Toward matrix multiplication for deep learning inference on the Xilinx Versal
Authors:
Jie Lei,
José Flich,
Enrique S. Quintana-Ortí
Abstract:
The remarkable positive impact of Deep Neural Networks on many Artificial Intelligence (AI) tasks has led to the development of various high performance algorithms as well as specialized processors and accelerators. In this paper we address this scenario by demonstrating that the principles underlying the modern realization of the general matrix multiplication (GEMM) in conventional processor arch…
▽ More
The remarkable positive impact of Deep Neural Networks on many Artificial Intelligence (AI) tasks has led to the development of various high performance algorithms as well as specialized processors and accelerators. In this paper we address this scenario by demonstrating that the principles underlying the modern realization of the general matrix multiplication (GEMM) in conventional processor architectures, are also valid to achieve high performance for the type of operations that arise in deep learning (DL) on an exotic accelerator such as the AI Engine (AIE) tile embedded in Xilinx Versal platforms. In particular, our experimental results with a prototype implementation of the GEMM kernel, on a Xilinx Versal VCK190, delivers performance close to 86.7% of the theoretical peak that can be expected on an AIE tile, for 16-bit integer operands.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Enabling Dynamic and Intelligent Workflows for HPC, Data Analytics, and AI Convergence
Authors:
Jorge Ejarque,
Rosa M. Badia,
Loïc Albertin,
Giovanni Aloisio,
Enrico Baglione,
Yolanda Becerra,
Stefan Boschert,
Julian R. Berlin,
Alessandro D'Anca,
Donatello Elia,
François Exertier,
Sandro Fiore,
José Flich,
Arnau Folch,
Steven J Gibbons,
Nikolay Koldunov,
Francesc Lordan,
Stefano Lorito,
Finn Løvholt,
Jorge Macías,
Fabrizio Marozzo,
Alberto Michelini,
Marisol Monterrubio-Velasco,
Marta Pienkowska,
Josep de la Puente
, et al. (12 additional authors not shown)
Abstract:
The evolution of High-Performance Computing (HPC) platforms enables the design and execution of progressively larger and more complex workflow applications in these systems. The complexity comes not only from the number of elements that compose the workflows but also from the type of computations they perform. While traditional HPC workflows target simulations and modelling of physical phenomena,…
▽ More
The evolution of High-Performance Computing (HPC) platforms enables the design and execution of progressively larger and more complex workflow applications in these systems. The complexity comes not only from the number of elements that compose the workflows but also from the type of computations they perform. While traditional HPC workflows target simulations and modelling of physical phenomena, current needs require in addition data analytics (DA) and artificial intelligence (AI) tasks. However, the development of these workflows is hampered by the lack of proper programming models and environments that support the integration of HPC, DA, and AI, as well as the lack of tools to easily deploy and execute the workflows in HPC systems. To progress in this direction, this paper presents use cases where complex workflows are required and investigates the main issues to be addressed for the HPC/DA/AI convergence. Based on this study, the paper identifies the challenges of a new workflow platform to manage complex workflows. Finally, it proposes a development approach for such a workflow platform addressing these challenges in two directions: first, by defining a software stack that provides the functionalities to manage these complex workflows; and second, by proposing the HPC Workflow as a Service (HPCWaaS) paradigm, which leverages the software stack to facilitate the reusability of complex workflows in federated HPC infrastructures. Proposals presented in this work are subject to study and development as part of the EuroHPC eFlows4HPC project.
△ Less
Submitted 13 May, 2022; v1 submitted 20 April, 2022;
originally announced April 2022.
-
TDSR: Transparent Distributed Segment-Based Routing
Authors:
Juan-José Crespo,
German Maglione-Mathey,
José L. Sánchez,
Francisco J. Alfaro-Cortés,
José Flich
Abstract:
Component reliability and performance pose a great challenge for interconnection networks. Future technology scaling such as transistor integration capacity in VLSI design will result in higher device degradation and manufacture variability. As a consequence, changes in the network arise, often rendering irregular topologies. This paper proposes a topology-agnostic distributed segment-based algori…
▽ More
Component reliability and performance pose a great challenge for interconnection networks. Future technology scaling such as transistor integration capacity in VLSI design will result in higher device degradation and manufacture variability. As a consequence, changes in the network arise, often rendering irregular topologies. This paper proposes a topology-agnostic distributed segment-based algorithm able to handle switch discovery in any topology while guaranteeing connectivity among switches. The proposal, known as Transparent Distributed Segment-Based Routing (TDSR), has been applied to meshes with defective link configurations.
△ Less
Submitted 4 June, 2020;
originally announced June 2020.
-
UPR: Deadlock-Free Dynamic Network Reconfiguration by Exploiting Channel Dependency Graph Compatibility
Authors:
Juan-José Crespo,
José L. Sánchez,
Francisco J. Alfaro-Cortés,
José Flich,
José Duato
Abstract:
Deadlock-free dynamic network reconfiguration process is usually studied from the routing algorithm restrictions and resource reservation perspective. The dynamic nature yielded by the transition process from one routing function to another is often managed by restricting resource usage in a static predefined manner, which often limits the supported routing algorithms and/or inactive link patterns…
▽ More
Deadlock-free dynamic network reconfiguration process is usually studied from the routing algorithm restrictions and resource reservation perspective. The dynamic nature yielded by the transition process from one routing function to another is often managed by restricting resource usage in a static predefined manner, which often limits the supported routing algorithms and/or inactive link patterns, or either requires additional resources such as virtual channels. Exploiting compatibility between routing functions by exploring their associated Channel Dependency Graphs (CDG) can take a great benefit from the dynamic nature of the reconfiguration process. In this paper, we propose a new dynamic reconfiguration process called Upstream Progressive Reconfiguration (UPR). Our algorithm progressively performs dependency addition/removal in a per channel basis relying on the information provided by the CDG while the reconfiguration process takes place. This gives us the opportunity to foresee compatible scenarios where both routing functions coexist, reducing the amount of resource drainage as well as packet injection halting.
△ Less
Submitted 21 January, 2021; v1 submitted 3 June, 2020;
originally announced June 2020.