-
Evaluation of CGRA Toolchains
Authors:
Dominik Walter,
Marita Halm,
Daniel Seidel,
Indrayudh Ghosh,
Christian Heidorn,
Frank Hannig,
Jürgen Teich
Abstract:
Increasing demands for computing power also propel the need for energy-efficient SoC accelerator architectures. One class for such accelerators are so-called processor arrays, which typically integrate a two-dimensional mesh of interconnected processing elements (PEs). Such arrays are specifically designed to accelerate the execution of multidimensional nested loops by exploiting the intrinsic par…
▽ More
Increasing demands for computing power also propel the need for energy-efficient SoC accelerator architectures. One class for such accelerators are so-called processor arrays, which typically integrate a two-dimensional mesh of interconnected processing elements (PEs). Such arrays are specifically designed to accelerate the execution of multidimensional nested loops by exploiting the intrinsic parallelism of such loops. Coarse-grained reconfigurable arrays (CGRAs) belong to this class of accelerator architectures. In this work, we analyze four toolchains for mapping loop programs onto CGRAs and compare the resulting mappings wrt. performance, i.e., latency. While most toolchains succeed in simpler kernels like general matrix multiplication, some struggle to find valid mappings for more complex loops like a triangular solver. Furthermore, we observe that the considered CGRA mappers generally tend to underutilize the available PEs.
△ Less
Submitted 27 February, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.
-
Mapping and Execution of Nested Loops on Processor Arrays: CGRAs vs. TCPAs
Authors:
Dominik Walter,
Marita Halm,
Daniel Seidel,
Indrayudh Ghosh,
Christian Heidorn,
Frank Hannig,
Jürgen Teich
Abstract:
Increasing demands for computing power also propel the need for energy-efficient SoC accelerator architectures. One class of such accelerators are so-called processor arrays, which typically integrate a two-dimensional mesh of interconnected processing elements~(PEs). Such arrays are specifically designed to accelerate the execution of multidimensional nested loops by exploiting the intrinsic para…
▽ More
Increasing demands for computing power also propel the need for energy-efficient SoC accelerator architectures. One class of such accelerators are so-called processor arrays, which typically integrate a two-dimensional mesh of interconnected processing elements~(PEs). Such arrays are specifically designed to accelerate the execution of multidimensional nested loops by exploiting the intrinsic parallelism of loops. Moreover, for mapping a given loop nest application, two opposed mapping methods have emerged: Operation-centric and iteration-centric. Both differ in the granularity of the mapping. The operation-centric approach maps individual operations to the PEs of the array, while the iteration-centric approach maps entire tiles of iterations to each PE. The operation-centric approach is applied predominantly for processor arrays often referred to as Coarse-Grained Reconfigurable Arrays~(CGRAs), while processor arrays supporting an iteration-centric approach are referred to as Tightly-Coupled Processor Arrays~(TCPAs) in the following. This work provides a comprehensive comparison of both approaches and related architectures by evaluating their respective benefits and trade-offs. ...
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
TreeLearn: A deep learning method for segmenting individual trees from ground-based LiDAR forest point clouds
Authors:
Jonathan Henrich,
Jan van Delden,
Dominik Seidel,
Thomas Kneib,
Alexander Ecker
Abstract:
Laser-scanned point clouds of forests make it possible to extract valuable information for forest management. To consider single trees, a forest point cloud needs to be segmented into individual tree point clouds. Existing segmentation methods are usually based on hand-crafted algorithms, such as identifying trunks and growing trees from them, and face difficulties in dense forests with overlappin…
▽ More
Laser-scanned point clouds of forests make it possible to extract valuable information for forest management. To consider single trees, a forest point cloud needs to be segmented into individual tree point clouds. Existing segmentation methods are usually based on hand-crafted algorithms, such as identifying trunks and growing trees from them, and face difficulties in dense forests with overlapping tree crowns. In this study, we propose TreeLearn, a deep learning-based approach for tree instance segmentation of forest point clouds. TreeLearn is trained on already segmented point clouds in a data-driven manner, making it less reliant on predefined features and algorithms. Furthermore, TreeLearn is implemented as a fully automatic pipeline and does not rely on extensive hyperparameter tuning, which makes it easy to use. Additionally, we introduce a new manually segmented benchmark forest dataset containing 156 full trees. The data is generated by mobile laser scanning and contributes to create a larger and more diverse data basis for model development and fine-grained instance segmentation evaluation. We trained TreeLearn on forest point clouds of 6665 trees, labeled using the Lidar360 software. An evaluation on the benchmark dataset shows that TreeLearn performs as well as the algorithm used to generate its training data. Furthermore, the performance can be vastly improved by fine-tuning the model using manually annotated datasets. We evaluate TreeLearn on our benchmark dataset and the Wytham Woods dataset, outperforming the recent SegmentAnyTree, ForAINet and TLS2Trees methods. The TreeLearn code and all datasets that were created in the course of this work are made publicly available.
△ Less
Submitted 6 January, 2025; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Learning to Exploit Elastic Actuators for Quadruped Locomotion
Authors:
Antonin Raffin,
Daniel Seidel,
Jens Kober,
Alin Albu-Schäffer,
João Silvério,
Freek Stulp
Abstract:
Spring-based actuators in legged locomotion provide energy-efficiency and improved performance, but increase the difficulty of controller design. While previous work has focused on extensive modeling and simulation to find optimal controllers for such systems, we propose to learn model-free controllers directly on the real robot. In our approach, gaits are first synthesized by central pattern gene…
▽ More
Spring-based actuators in legged locomotion provide energy-efficiency and improved performance, but increase the difficulty of controller design. While previous work has focused on extensive modeling and simulation to find optimal controllers for such systems, we propose to learn model-free controllers directly on the real robot. In our approach, gaits are first synthesized by central pattern generators (CPGs), whose parameters are optimized to quickly obtain an open-loop controller that achieves efficient locomotion. Then, to make this controller more robust and further improve the performance, we use reinforcement learning to close the loop, to learn corrective actions on top of the CPGs. We evaluate the proposed approach on the DLR elastic quadruped bert. Our results in learning trotting and pronking gaits show that exploitation of the spring actuator dynamics emerges naturally from optimizing for dynamic motions, yielding high-performing locomotion, particularly the fastest walking gait recorded on bert, despite being model-free. The whole process takes no more than 1.5 hours on the real robot and results in natural-looking gaits.
△ Less
Submitted 20 August, 2023; v1 submitted 15 September, 2022;
originally announced September 2022.
-
Improvements for Free
Authors:
Daniel Seidel,
Janis Voigtländer
Abstract:
"Theorems for Free!" (Wadler, FPCA 1989) is a slogan for a technique that allows to derive statements about functions just from their types. So far, the statements considered have always had a purely extensional flavor: statements relating the value semantics of program expressions, but not statements relating their runtime (or other) cost. Here we study an extension of the technique that allows p…
▽ More
"Theorems for Free!" (Wadler, FPCA 1989) is a slogan for a technique that allows to derive statements about functions just from their types. So far, the statements considered have always had a purely extensional flavor: statements relating the value semantics of program expressions, but not statements relating their runtime (or other) cost. Here we study an extension of the technique that allows precisely statements of the latter flavor, by deriving quantitative theorems for free. After developing the theory, we walk through a number of example derivations. Probably none of the statements derived in those simple examples will be particularly surprising to most readers, but what is maybe surprising, and at the very least novel, is that there is a general technique for obtaining such results on a quantitative level in a principled way. Moreover, there is good potential to bring that technique to bear on more complex examples as well. We turn our attention to short-cut fusion (Gill et al., FPCA 1993) in particular.
△ Less
Submitted 6 July, 2011;
originally announced July 2011.