-
Finite-Time Lyapunov Exponent Calculation on FPGA using High-Level Synthesis Tools
Authors:
Manuel de Castro,
Roberto R. Osorio,
Francisco J. Andujar,
Rocío Carratalá-Sáez,
Yuri Torres,
Diego R. Llanos
Abstract:
As Field Programmable Gate Arrays (FPGAs) computing capabilities continue to grow, also does the interest on building scientific accelerators around them. Tools like Xilinx's High-Level Synthesis (HLS) help to bridge the gap between traditional high-level languages such as C and C++, and low-level hardware description languages such as VHDL and Verilog. In this report, we study the implementation…
▽ More
As Field Programmable Gate Arrays (FPGAs) computing capabilities continue to grow, also does the interest on building scientific accelerators around them. Tools like Xilinx's High-Level Synthesis (HLS) help to bridge the gap between traditional high-level languages such as C and C++, and low-level hardware description languages such as VHDL and Verilog. In this report, we study the implementation of a fluid dynamics application, the Finite-Time Lyapunov Exponent (FTLE) calculation, on FPGA using HLS. We provide speed and resource-consumption results for 2- and 3-dimensional cases.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Open SYCL on heterogeneous GPU systems: A case of study
Authors:
Rocío Carratalá-Sáez,
Francisco J. andújar,
Yuri Torres,
Arturo Gonzalez-Escribano,
Diego R. Llanos
Abstract:
Computational platforms for high-performance scientific applications are becoming more heterogenous, including hardware accelerators such as multiple GPUs. Applications in a wide variety of scientific fields require an efficient and careful management of the computational resources of this type of hardware to obtain the best possible performance. However, there are currently different GPU vendors,…
▽ More
Computational platforms for high-performance scientific applications are becoming more heterogenous, including hardware accelerators such as multiple GPUs. Applications in a wide variety of scientific fields require an efficient and careful management of the computational resources of this type of hardware to obtain the best possible performance. However, there are currently different GPU vendors, architectures and families that can be found in heterogeneous clusters or machines. Programming with the vendor provided languages or frameworks, and optimizing for specific devices, may become cumbersome and compromise portability to other systems. To overcome this problem, several proposals for high-level heterogeneous programming have appeared, trying to reduce the development effort and increase functional and performance portability, specifically when using GPU hardware accelerators.
This paper evaluates the SYCL programming model, using the Open SYCL compiler, from two different perspectives: The performance it offers when dealing with single or multiple GPU devices from the same or different vendors, and the development effort required to implement the code. We use as case of study the Finite Time Lyapunov Exponent calculation over two real-world scenarios and compare the performance and the development effort of its Open SYCL-based version against the equivalent versions that use CUDA or HIP.
Based on the experimental results, we observe that the use of SYCL does not lead to a remarkable overhead in terms of the GPU kernels execution time. In general terms, the Open SYCL development effort for the host code is lower than that observed with CUDA or HIP. Moreover, the SYCL version can take advantage of both CUDA and AMD GPU devices simultaneously much easier than directly using the vendor-specific programming solutions.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Task-based preemptive scheduling on FPGAs leveraging partial reconfiguration
Authors:
Gabriel Rodriguez-Canal,
Nick Brown,
Yuri Torres,
Arturo Gonzalez-Escribano
Abstract:
FPGAs are an attractive type of accelerator for all-purpose HPC computing systems due to the possibility of deploying tailored hardware on demand. However, the common tools for programming and operating FPGAs are still complex to use, especially in scenarios where diverse types of tasks should be dynamically executed. In this work we present a programming abstraction with a simple interface that i…
▽ More
FPGAs are an attractive type of accelerator for all-purpose HPC computing systems due to the possibility of deploying tailored hardware on demand. However, the common tools for programming and operating FPGAs are still complex to use, especially in scenarios where diverse types of tasks should be dynamically executed. In this work we present a programming abstraction with a simple interface that internally leverages High-Level Synthesis, Dynamic Partial Reconfiguration and synchronisation mechanisms to use an FPGA as a multi-tasking server with preemptive scheduling and priority queues. This leads to an improved use of the FPGA resources, allowing the execution of several different kernels concurrently and deploying the most urgent ones as fast as possible.
The results of our experimental study show that our approach incurs only a 10% overhead in the worst case when using two reconfigurable regions, whilst providing a significant performance improvement of at least 24% over the traditional full reconfiguration approach.
△ Less
Submitted 18 January, 2023;
originally announced January 2023.
-
Programming abstractions for preemptive scheduling in FPGAs using partial reconfiguration
Authors:
Gabriel Rodriguez-Canal,
Nick Brown,
Yuri Torres,
Arturo Gonzalez-Escribano
Abstract:
FPGAs are an attractive type of accelerator for all-purpose HPC computing systems due to the possibility of deploying tailored hardware on demand. However, the common tools for programming and operating FPGAs are still complex to use, specially in scenarios where diverse types of tasks should be dynamically executed. In this work we present a programming abstraction with a simple interface that in…
▽ More
FPGAs are an attractive type of accelerator for all-purpose HPC computing systems due to the possibility of deploying tailored hardware on demand. However, the common tools for programming and operating FPGAs are still complex to use, specially in scenarios where diverse types of tasks should be dynamically executed. In this work we present a programming abstraction with a simple interface that internally leverages High-Level Synthesis, Dynamic Partial Reconfiguration and synchronisation mechanisms to use an FPGA as a multi-tasking server with preemptive scheduling and priority queues. This leads to a better use of the FPGA resources, allowing the execution of several kernels at the same time and deploying the most urgent ones as fast as possible. The results of our experimental study show that our approach incurs only a 1.66% overhead when using only one Reconfigurable Region (RR), and 4.04% when using two RRs, whilst presenting a significant performance improvement over the traditional non-preemptive full reconfiguration approach.
△ Less
Submitted 26 August, 2022;
originally announced September 2022.
-
Floods impact dynamics quantified from big data sources
Authors:
David Pastor-Escuredo,
Yolanda Torres,
Maria Martinez,
Pedro J. Zufiria
Abstract:
Natural disasters affect hundreds of millions of people worldwide every year. Early warning, humanitarian response and recovery mechanisms can be improved by using big data sources. Measuring the different dimensions of the impact of natural disasters is critical for designing policies and building up resilience. Detailed quantification of the movement and behaviours of affected populations requir…
▽ More
Natural disasters affect hundreds of millions of people worldwide every year. Early warning, humanitarian response and recovery mechanisms can be improved by using big data sources. Measuring the different dimensions of the impact of natural disasters is critical for designing policies and building up resilience. Detailed quantification of the movement and behaviours of affected populations requires the use of high granularity data that entails privacy risks. Leveraging all this data is costly and has to be done ensuring privacy and security of large amounts of data. Proxies based on social media and data aggregates would streamline this process by providing evidences and narrowing requirements. We propose a framework that integrates environmental data, social media, remote sensing, digital topography and mobile phone data to understand different types of floods and how data can provide insights useful for managing humanitarian action and recovery plans. Thus, data is dynamically requested upon data-based indicators forming a multi-granularity and multi-access data pipeline. We present a composed study of three cases to show potential variability in the natures of floodings,as well as the impact and applicability of data sources. Critical heterogeneity of the available data in the different cases has to be addressed in order to design systematic approaches based on data. The proposed framework establishes the foundation to relate the physical and socio-economical impacts of floods.
△ Less
Submitted 24 April, 2018;
originally announced April 2018.
-
An Efficient Polyphase Filter Based Resampling Method for Unifying the PRFs in SAR Data
Authors:
Yoangel Torres,
Kamal Premaratne,
Falk Amelung,
Shimon Wdowinski
Abstract:
Variable and higher pulse repetition frequencies (PRFs) are increasingly being used to meet the stricter requirements and complexities of current airborne and spaceborne synthetic aperture radar (SAR) systems associated with higher resolution and wider area products. POLYPHASE, the proposed resampling scheme, downsamples and unifies variable PRFs within a single look complex (SLC) SAR acquisition…
▽ More
Variable and higher pulse repetition frequencies (PRFs) are increasingly being used to meet the stricter requirements and complexities of current airborne and spaceborne synthetic aperture radar (SAR) systems associated with higher resolution and wider area products. POLYPHASE, the proposed resampling scheme, downsamples and unifies variable PRFs within a single look complex (SLC) SAR acquisition and across a repeat pass sequence of acquisitions down to an effective lower PRF. A sparsity condition of the received SAR data ensures that the uniformly resampled data approximates the spectral properties of a decimated densely sampled version of the received SAR data. While experiments conducted with both synthetically generated and real airborne SAR data show that POLYPHASE retains comparable performance to the state-of-the-art BLUI scheme in image quality, a polyphase filter-based implementation of POLYPHASE offers significant computational savings for arbitrary (not necessarily periodic) input PRF variations, thus allowing fully on-board, in-place, and real-time implementation.
△ Less
Submitted 23 May, 2017; v1 submitted 22 October, 2015;
originally announced October 2015.