Search | arXiv e-print repository

arXiv:1902.03192 [pdf, ps, other]

Software-Defined FPGA Accelerator Design for Mobile Deep Learning Applications

Authors: Panagiotis G. Mousouliotis, Loukas P. Petrou

Abstract: Recently, the field of deep learning has received great attention by the scientific community and it is used to provide improved solutions to many computer vision problems. Convolutional neural networks (CNNs) have been successfully used to attack problems such as object recognition, object detection, semantic segmentation, and scene understanding. The rapid development of deep learning goes hand… ▽ More Recently, the field of deep learning has received great attention by the scientific community and it is used to provide improved solutions to many computer vision problems. Convolutional neural networks (CNNs) have been successfully used to attack problems such as object recognition, object detection, semantic segmentation, and scene understanding. The rapid development of deep learning goes hand by hand with the adaptation of GPUs for accelerating its processes, such as network training and inference. Even though FPGA design exists long before the use of GPUs for accelerating computations and despite the fact that high-level synthesis (HLS) tools are getting more attractive, the adaptation of FPGAs for deep learning research and application development is poor due to the requirement of hardware design related expertise. This work presents a workflow for deep learning mobile application acceleration on small low-cost low-power FPGA devices using HLS tools. This workflow eases the design of an improved version of the SqueezeJet accelerator used for the speedup of mobile-friendly low-parameter ImageNet class CNNs, such as the SqueezeNet v1.1 and the ZynqNet. Additionally, the workflow includes the development of an HLS-driven analytical model which is used for performance estimation of the accelerator. This model can be also used to direct the design process and lead to future design improvements and optimizations. △ Less

Submitted 24 March, 2019; v1 submitted 8 February, 2019; originally announced February 2019.

Comments: Accepted to be presented in the 15th International Symposium on Applied Reconfigurable Computing

arXiv:1811.04863 [pdf, other]

A Framework of Transfer Learning in Object Detection for Embedded Systems

Authors: Ioannis Athanasiadis, Panagiotis Mousouliotis, Loukas Petrou

Abstract: Transfer learning is one of the subjects undergoing intense study in the area of machine learning. In object recognition and object detection there are known experiments for the transferability of parameters, but not for neural networks which are suitable for object detection in real time embedded applications, such as the SqueezeDet neural network. We use transfer learning to accelerate the train… ▽ More Transfer learning is one of the subjects undergoing intense study in the area of machine learning. In object recognition and object detection there are known experiments for the transferability of parameters, but not for neural networks which are suitable for object detection in real time embedded applications, such as the SqueezeDet neural network. We use transfer learning to accelerate the training of SqueezeDet to a new group of classes. Also, experiments are conducted to study the transferability and co-adaptation phenomena introduced by the transfer learning process. To accelerate training, we propose a new implementation of the SqueezeDet training which provides a faster pipeline for data processing and achieves 1.8 times speedup compared to the initial implementation. Finally, we created a mechanism for automatic hyperparameter optimization using an empirical method. △ Less

Submitted 24 November, 2018; v1 submitted 12 November, 2018; originally announced November 2018.

arXiv:1810.06329 [pdf, other]

Fault Adaptive Routing in Metasurface Controller Networks

Authors: Taqwa Saeed, Constantinos Skitsas, Dimitrios Kouzapas, Marios Lestas, Vassos Soteriou, Anna Philippou, Sergi Abadal, Christos Liaskos, Loukas Petrou, Julius Georgiou, Andreas Pitsillides

Abstract: HyperSurfaces are a merge of structurally reconfigurable metasurfaces whose electromagnetic properties can be changed via a software interface, using an embedded miniaturized network of controllers, thus enabling novel capabilities in wireless communications. Resource constraints associated with the development of a hardware testbed of this breakthrough technology necessitate network controller ar… ▽ More HyperSurfaces are a merge of structurally reconfigurable metasurfaces whose electromagnetic properties can be changed via a software interface, using an embedded miniaturized network of controllers, thus enabling novel capabilities in wireless communications. Resource constraints associated with the development of a hardware testbed of this breakthrough technology necessitate network controller architectures different from traditional regular Network-on-Chip architectures. The Manhattan-like topology chosen to realize the controller network in the testbed under development is irregular, with restricted local path selection options, operating in an asynchronous fashion. These characteristics render traditional fault-tolerant routing mechanisms inadequate. In this paper, we present work in progress towards the development of fault-tolerant routing mechanisms for the chosen architecture. We present two XY-based approaches which have been developed aiming to offer reliable data delivery in the presence of faults. The first approach aims to avoid loops while the second one attempts to maximize the success delivery probabilities. Their effectiveness is demonstrated via simulations conducted on a custom developed simulator. △ Less

Submitted 15 October, 2018; originally announced October 2018.

Comments: 6 pages, 8 figures, conference

arXiv:1807.09339 [pdf, other]

Formal Verification of a Programmable Hypersurface

Authors: Panagiotis Kouvaros, Dimitris Kouzapas, Anna Philippou, Julius Georgiou, Loukas Petrou, Andreas Pitsillides

Abstract: A metasurface is a surface that consists of artificial material, called metamaterial, with configurable electromagnetic properties. This paper presents work in progress on the design and formal verification of a programmable metasurface, the Hypersurface, as part of the requirements of the VISORSURF research program (HORIZON 2020 FET-OPEN). The Hypersurface design is concerned with the development… ▽ More A metasurface is a surface that consists of artificial material, called metamaterial, with configurable electromagnetic properties. This paper presents work in progress on the design and formal verification of a programmable metasurface, the Hypersurface, as part of the requirements of the VISORSURF research program (HORIZON 2020 FET-OPEN). The Hypersurface design is concerned with the development of a network of switch controllers that are responsible for configuring the metamaterial. The design of the Hypersurface, however, has demanding requirements that need to be delivered within a context of limited resources. This paper shares the experience of a rigorous design procedure for the Hypersurface network, that involves iterations between designing a network and its protocols and the formal evaluation of each design. Formal evaluation has provided results that, so far, drive the development team in a more robust design and overall aid in reducing the cost of the Hypersurface manufacturing. This paper presents work in progress on the design and formal verification of a programmable Hypersurface as part of the requirements of the VISORSURF research programme (HORIZON 2020 FET-OPEN). △ Less

Submitted 17 July, 2018; originally announced July 2018.

Comments: 13 pages. The paper has been accepted at FMICS 2018 and will be published by Springer

arXiv:1805.08695 [pdf, other]

doi 10.1007/978-3-319-78890-6_5

SqueezeJet: High-level Synthesis Accelerator Design for Deep Convolutional Neural Networks

Authors: Panagiotis G. Mousouliotis, Loukas P. Petrou

Abstract: Deep convolutional neural networks have dominated the pattern recognition scene by providing much more accurate solutions in computer vision problems such as object recognition and object detection. Most of these solutions come at a huge computational cost, requiring billions of multiply-accumulate operations and, thus, making their use quite challenging in real-time applications that run on embed… ▽ More Deep convolutional neural networks have dominated the pattern recognition scene by providing much more accurate solutions in computer vision problems such as object recognition and object detection. Most of these solutions come at a huge computational cost, requiring billions of multiply-accumulate operations and, thus, making their use quite challenging in real-time applications that run on embedded mobile (resource-power constrained) hardware. This work presents the architecture, the high-level synthesis design, and the implementation of SqueezeJet, an FPGA accelerator for the inference phase of the SqueezeNet DCNN architecture, which is designed specifically for use in embedded systems. Results show that SqueezeJet can achieve 15.16 times speed-up compared to the software implementation of SqueezeNet running on an embedded mobile processor with less than 1% drop in top-5 accuracy. △ Less

Submitted 6 May, 2018; originally announced May 2018.

Comments: The final publication is available at Springer via https://doi.org/10.1007/978-3-319-78890-6_5

arXiv:1804.00512 [pdf, other]

Expanding a robot's life: Low power object recognition via FPGA-based DCNN deployment

Authors: Panagiotis G. Mousouliotis, Konstantinos L. Panayiotou, Emmanouil G. Tsardoulias, Loukas P. Petrou, Andreas L. Symeonidis

Abstract: FPGAs are commonly used to accelerate domain-specific algorithmic implementations, as they can achieve impressive performance boosts, are reprogrammable and exhibit minimal power consumption. In this work, the SqueezeNet DCNN is accelerated using an SoC FPGA in order for the offered object recognition resource to be employed in a robotic application. Experiments are conducted to investigate the pe… ▽ More FPGAs are commonly used to accelerate domain-specific algorithmic implementations, as they can achieve impressive performance boosts, are reprogrammable and exhibit minimal power consumption. In this work, the SqueezeNet DCNN is accelerated using an SoC FPGA in order for the offered object recognition resource to be employed in a robotic application. Experiments are conducted to investigate the performance and power consumption of the implementation in comparison to deployment on other widely-used computational systems. △ Less

Submitted 23 March, 2018; originally announced April 2018.

Comments: Accepted in MOCAST 2018

arXiv:1711.06591 [pdf, other]

Metric Map Merging using RFID Tags & Topological Information

Authors: Emmanouil Tsardoulias, Aristeidis Thallas, Loukas Petrou

Abstract: A map merging component is crucial for the proper functionality of a multi-robot system performing exploration, since it provides the means to integrate and distribute the most important information carried by the agents: the explored-covered space and its exact (depending on the SLAM accuracy) morphology. Map merging is a prerequisite for an intelligent multi-robot team aiming to deploy a smart e… ▽ More A map merging component is crucial for the proper functionality of a multi-robot system performing exploration, since it provides the means to integrate and distribute the most important information carried by the agents: the explored-covered space and its exact (depending on the SLAM accuracy) morphology. Map merging is a prerequisite for an intelligent multi-robot team aiming to deploy a smart exploration technique. In the current work, a metric map merging approach based on environmental information is proposed, in conjunction with spatially scattered RFID tags localization. This approach is divided into the following parts: the maps approximate rotation calculation via the obstacles poses and localized RFID tags, the translation employing the best localized common RFID tag and finally the transformation refinement using an ICP algorithm. △ Less

Submitted 17 November, 2017; originally announced November 2017.

Comments: Autonomous robots, Mapping, Map-Merging, RFIDs, RANSAC, ICP

Showing 1–7 of 7 results for author: Petrou, L