-
Coyote v2: Raising the Level of Abstraction for Data Center FPGAs
Authors:
Benjamin Ramhorst,
Dario Korolija,
Maximilian Jakob Heer,
Jonas Dann,
Luhao Liu,
Gustavo Alonso
Abstract:
In the trend towards hardware specialization, FPGAs play a dual role as accelerators for offloading, e.g., network virtualization, and as a vehicle for prototyping and exploring hardware designs. While FPGAs offer versatility and performance, integrating them in larger systems remains challenging. Thus, recent efforts have focused on raising the level of abstraction through better interfaces and h…
▽ More
In the trend towards hardware specialization, FPGAs play a dual role as accelerators for offloading, e.g., network virtualization, and as a vehicle for prototyping and exploring hardware designs. While FPGAs offer versatility and performance, integrating them in larger systems remains challenging. Thus, recent efforts have focused on raising the level of abstraction through better interfaces and high-level programming languages. Yet, there is still quite some room for improvement. In this paper, we present Coyote v2, an open source FPGA shell built with a novel, three-layer hierarchical design supporting dynamic partial reconfiguration of services and user logic, with a unified logic interface, and high-level software abstractions such as support for multithreading and multitenancy. Experimental results indicate Coyote v2 reduces synthesis times between 15% and 20% and run-time reconfiguration times by an order of magnitude, when compared to existing systems. We also demonstrate the advantages of Coyote v2 by deploying several realistic applications, including HyperLogLog cardinality estimation, AES encryption, and neural network inference. Finally, Coyote v2 places a great deal of emphasis on integration with real systems through reusable and reconfigurable services, including a fully RoCE v2-compliant networking stack, a shared virtual memory model with the host, and a DMA engine between FPGAs and GPUs. We demonstrate these features by, e.g., seamlessly deploying an FPGA-accelerated neural network from Python.
△ Less
Submitted 30 April, 2025;
originally announced April 2025.
-
GraphMatch: Subgraph Query Processing on FPGAs
Authors:
Jonas Dann,
Tobias Götz,
Daniel Ritter,
Jana Giceva,
Holger Fröning
Abstract:
Efficiently finding subgraph embeddings in large graphs is crucial for many application areas like biology and social network analysis. Set intersections are the predominant and most challenging aspect of current join-based subgraph query processing systems for CPUs. Previous work has shown the viability of utilizing FPGAs for acceleration of graph and join processing.
In this work, we propose G…
▽ More
Efficiently finding subgraph embeddings in large graphs is crucial for many application areas like biology and social network analysis. Set intersections are the predominant and most challenging aspect of current join-based subgraph query processing systems for CPUs. Previous work has shown the viability of utilizing FPGAs for acceleration of graph and join processing.
In this work, we propose GraphMatch, the first genearl-purpose stand-alone subgraph query processing accelerator based on worst-case optimal joins (WCOJ) that is fully designed for modern, field programmable gate array (FPGA) hardware. For efficient processing of various graph data sets and query graph patterns, it leverages a novel set intersection approach, called AllCompare, tailor-made for FPGAs. We show that this set intersection approach efficiently solves multi-set intersections in subgraph query processing, superior to CPU-based approaches. Overall, GraphMatch achieves a speedup of over 2.68x and 5.16x, compared to the state-of-the-art systems GraphFlow and RapidMatch, respectively.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Automated control and optimisation of laser driven ion acceleration
Authors:
B. Loughran,
M. J. V. Streeter,
H. Ahmed,
S. Astbury,
M. Balcazar,
M. Borghesi,
N. Bourgeois,
C. B. Curry,
S. J. D. Dann,
S. DiIorio,
N. P. Dover,
T. Dzelzanis,
O. C. Ettlinger,
M. Gauthier,
L. Giuffrida,
G. D. Glenn,
S. H. Glenzer,
J. S. Green,
R. J. Gray,
G. S. Hicks,
C. Hyland,
V. Istokskaia,
M. King,
D. Margarone,
O. McCusker
, et al. (10 additional authors not shown)
Abstract:
The interaction of relativistically intense lasers with opaque targets represents a highly non-linear, multi-dimensional parameter space. This limits the utility of sequential 1D scanning of experimental parameters for the optimisation of secondary radiation, although to-date this has been the accepted methodology due to low data acquisition rates. High repetition-rate (HRR) lasers augmented by ma…
▽ More
The interaction of relativistically intense lasers with opaque targets represents a highly non-linear, multi-dimensional parameter space. This limits the utility of sequential 1D scanning of experimental parameters for the optimisation of secondary radiation, although to-date this has been the accepted methodology due to low data acquisition rates. High repetition-rate (HRR) lasers augmented by machine learning present a valuable opportunity for efficient source optimisation. Here, an automated, HRR-compatible system produced high fidelity parameter scans, revealing the influence of laser intensity on target pre-heating and proton generation. A closed-loop Bayesian optimisation of maximum proton energy, through control of the laser wavefront and target position, produced proton beams with equivalent maximum energy to manually-optimized laser pulses but using only 60% of the laser energy. This demonstration of automated optimisation of laser-driven proton beams is a crucial step towards deeper physical insight and the construction of future radiation sources.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
GraphScale: Scalable Bandwidth-Efficient Graph Processing on FPGAs
Authors:
Jonas Dann,
Daniel Ritter,
Holger Fröning
Abstract:
Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine learning and data analytics. While FPGAs denote a promising solution through flexible memory hierarchies and massive parallelism, we argue that current graph processin…
▽ More
Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine learning and data analytics. While FPGAs denote a promising solution through flexible memory hierarchies and massive parallelism, we argue that current graph processing accelerators either use the off-chip memory bandwidth inefficiently or do not scale well across memory channels.
In this work, we propose GraphScale, a scalable graph processing framework for FPGAs. For the first time, GraphScale combines multi-channel memory with asynchronous graph processing (i.e., for fast convergence on results) and a compressed graph representation (i.e., for efficient usage of memory bandwidth and reduced memory footprint). GraphScale solves common graph problems like breadth-first search, PageRank, and weakly-connected components through modular user-defined functions, a novel two-dimensional partitioning scheme, and a high-performance two-level crossbar design.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
Demystifying Memory Access Patterns of FPGA-Based Graph Processing Accelerators
Authors:
Jonas Dann,
Daniel Ritter,
Holger Fröning
Abstract:
Recent advances in reprogrammable hardware (e.g., FPGAs) and memory technology (e.g., DDR4, HBM) promise to solve performance problems inherent to graph processing like irregular memory access patterns on traditional hardware (e.g., CPU). While several of these graph accelerators were proposed in recent years, it remains difficult to assess their performance and compare them on common graph worklo…
▽ More
Recent advances in reprogrammable hardware (e.g., FPGAs) and memory technology (e.g., DDR4, HBM) promise to solve performance problems inherent to graph processing like irregular memory access patterns on traditional hardware (e.g., CPU). While several of these graph accelerators were proposed in recent years, it remains difficult to assess their performance and compare them on common graph workloads and accelerator platforms, due to few open source implementations and excessive implementation effort.
In this work, we build on a simulation environment for graph processing accelerators, to make several existing accelerator approaches comparable. This allows us to study relevant performance dimensions such as partitioning schemes and memory technology, among others. The evaluation yields insights into the strengths and weaknesses of current graph processing accelerators along these dimensions, and features a novel in-depth comparison.
△ Less
Submitted 31 March, 2021;
originally announced April 2021.
-
Exploring Memory Access Patterns for Graph Processing Accelerators
Authors:
Jonas Dann,
Daniel Ritter,
Holger Fröning
Abstract:
Recent trends in business and technology (e.g., machine learning, social network analysis) benefit from storing and processing growing amounts of graph-structured data in databases and data science platforms. FPGAs as accelerators for graph processing with a customizable memory hierarchy promise solving performance problems caused by inherent irregular memory access patterns on traditional hardwar…
▽ More
Recent trends in business and technology (e.g., machine learning, social network analysis) benefit from storing and processing growing amounts of graph-structured data in databases and data science platforms. FPGAs as accelerators for graph processing with a customizable memory hierarchy promise solving performance problems caused by inherent irregular memory access patterns on traditional hardware (e.g., CPU). However, developing such hardware accelerators is yet time-consuming and difficult and benchmarking is non-standardized, hindering comprehension of the impact of memory access pattern changes and systematic engineering of graph processing accelerators.
In this work, we propose a simulation environment for the analysis of graph processing accelerators based on simulating their memory access patterns. Further, we evaluate our approach on two state-of-the-art FPGA graph processing accelerators and show reproducibility, comparablity, as well as the shortened development process by an example. Not implementing the cycle-accurate internal data flow on accelerator hardware like FPGAs significantly reduces the implementation time, increases the benchmark parameter transparency, and allows comparison of graph processing approaches.
△ Less
Submitted 7 February, 2021; v1 submitted 26 October, 2020;
originally announced October 2020.
-
Non-Relational Databases on FPGAs: Survey, Design Decisions, Challenges
Authors:
Jonas Dann,
Daniel Ritter,
Holger Fröning
Abstract:
Non-relational database systems (NRDS), such as graph, document, key-value, and wide-column, have gained much attention in various trending (business) application domains like smart logistics, social network analysis, and medical applications, due to their data model variety and scalability. The broad data variety and sheer size of datasets pose unique challenges for the system design and runtime…
▽ More
Non-relational database systems (NRDS), such as graph, document, key-value, and wide-column, have gained much attention in various trending (business) application domains like smart logistics, social network analysis, and medical applications, due to their data model variety and scalability. The broad data variety and sheer size of datasets pose unique challenges for the system design and runtime (incl. power consumption). While CPU performance scaling becomes increasingly more difficult, we argue that NRDS can benefit from adding field programmable gate arrays (FPGAs) as accelerators. However, FPGA-accelerated NRDS have not been systematically studied, yet.
To facilitate understanding of this emerging domain, we explore the fit of FPGA acceleration for NRDS with a focus on data model variety. We define the term NRDS class as a group of non-relational database systems supporting the same data model. This survey describes and categorizes the inherent differences and non-trivial trade-offs of relevant NRDS classes as well as their commonalities in the context of common design decisions when building such a system with FPGAs. For example, we found in the literature that for key-value stores the FPGA should be placed into the system as a smart network interface card (SmartNIC) to benefit from direct access of the FPGA to the network. However, more complex data models and processing of other classes (e.g., graph and document) commonly require more elaborate near-data or socket accelerator placements where the FPGA respectively has the only or shared access to main memory. Across the different classes, FPGAs can be used as communication layer or for acceleration of operators and data access. We close with open research and engineering challenges to outline the future of FPGA-accelerated NRDS.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
An Automated Framework for the Extraction of Semantic Legal Metadata from Legal Texts
Authors:
Amin Sleimi,
Nicolas Sannier,
Mehrdad Sabetzadeh,
Lionel Briand,
Marcello Ceci,
John Dann
Abstract:
Semantic legal metadata provides information that helps with understanding and interpreting legal provisions. Such metadata is therefore important for the systematic analysis of legal requirements. However, manually enhancing a large legal corpus with semantic metadata is prohibitively expensive. Our work is motivated by two observations: (1) the existing requirements engineering (RE) literature d…
▽ More
Semantic legal metadata provides information that helps with understanding and interpreting legal provisions. Such metadata is therefore important for the systematic analysis of legal requirements. However, manually enhancing a large legal corpus with semantic metadata is prohibitively expensive. Our work is motivated by two observations: (1) the existing requirements engineering (RE) literature does not provide a harmonized view on the semantic metadata types that are useful for legal requirements analysis; (2) automated support for the extraction of semantic legal metadata is scarce, and it does not exploit the full potential of artificial intelligence technologies, notably natural language processing (NLP) and machine learning (ML). Our objective is to take steps toward overcoming these limitations. To do so, we review and reconcile the semantic legal metadata types proposed in the RE literature. Subsequently, we devise an automated extraction approach for the identified metadata types using NLP and ML. We evaluate our approach through two case studies over the Luxembourgish legislation. Our results indicate a high accuracy in the generation of metadata annotations. In particular, in the two case studies, we were able to obtain precision scores of 97.2% and 82.4% and recall scores of 94.9% and 92.4%.
△ Less
Submitted 30 January, 2020;
originally announced January 2020.