Skip to main content

Showing 1–37 of 37 results for author: Bouganis, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.08981  [pdf, ps, other

    cs.AR

    ITERA-LLM: Boosting Sub-8-Bit Large Language Model Inference via Iterative Tensor Decomposition

    Authors: Keran Zheng, Yinting Huang, Zhewen Yu, Christos-Savvas Bouganis

    Abstract: Recent advancements in Large Language Models (LLMs) have demonstrated impressive capabilities as their scale expands to billions of parameters. Deploying these large-scale models on resource-constrained platforms presents significant challenges, with post-training fixed-point quantization often used as a model compression technique. However, quantization-only methods typically lead to significant… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  2. arXiv:2502.04923  [pdf, other

    cs.CV cs.AI

    Cached Multi-Lora Composition for Multi-Concept Image Generation

    Authors: Xiandong Zou, Mingzhu Shen, Christos-Savvas Bouganis, Yiren Zhao

    Abstract: Low-Rank Adaptation (LoRA) has emerged as a widely adopted technique in text-to-image models, enabling precise rendering of multiple distinct elements, such as characters and styles, in multi-concept image generation. However, current approaches face significant challenges when composing these LoRAs for multi-concept image generation, resulting in diminished generated image quality. In this paper,… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Comments: The Thirteenth International Conference on Learning Representations (ICLR 2025)

  3. arXiv:2406.03088  [pdf, other

    cs.AR cs.LG

    HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator

    Authors: Zhewen Yu, Sudarshan Sreeram, Krish Agrawal, Junyi Wu, Alexander Montgomerie-Corcoran, Cheng Zhang, Jianyi Cheng, Christos-Savvas Bouganis, Yiren Zhao

    Abstract: Deep Neural Networks (DNNs) excel in learning hierarchical representations from raw data, such as images, audio, and text. To compute these DNN models with high performance and energy efficiency, these models are usually deployed onto customized hardware accelerators. Among various accelerator designs, dataflow architecture has shown promising performance due to its layer-pipelined structure and i… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: accepted to FPL2024

  4. arXiv:2406.01125  [pdf, other

    cs.CV

    $Δ$-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers

    Authors: Pengtao Chen, Mingzhu Shen, Peng Ye, Jianjian Cao, Chongjun Tu, Christos-Savvas Bouganis, Yiren Zhao, Tao Chen

    Abstract: Diffusion models are widely recognized for generating high-quality and diverse images, but their poor real-time performance has led to numerous acceleration works, primarily focusing on UNet-based structures. With the more successful results achieved by diffusion transformers (DiT), there is still a lack of exploration regarding the impact of DiT structure on generation, as well as the absence of… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures, 6 tables

  5. arXiv:2403.18921  [pdf, other

    cs.AR cs.CV cs.LG

    SMOF: Streaming Modern CNNs on FPGAs with Smart Off-Chip Eviction

    Authors: Petros Toupas, Zhewen Yu, Christos-Savvas Bouganis, Dimitrios Tzovaras

    Abstract: Convolutional Neural Networks (CNNs) have demonstrated their effectiveness in numerous vision tasks. However, their high processing requirements necessitate efficient hardware acceleration to meet the application's performance targets. In the space of FPGAs, streaming-based dataflow architectures are often adopted by users, as significant performance gains can be achieved through layer-wise pipeli… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 12 pages, 8 figures, 5 tables

  6. arXiv:2403.14715  [pdf, other

    cs.LG cs.AI cs.CV

    Towards Understanding Why Label Smoothing Degrades Selective Classification and How to Fix It

    Authors: Guoxuan Xia, Olivier Laurent, Gianni Franchi, Christos-Savvas Bouganis

    Abstract: Label smoothing (LS) is a popular regularisation method for training neural networks as it is effective in improving test accuracy and is simple to implement. ``Hard'' one-hot labels are ``smoothed'' by uniformly distributing probability mass to other classes, reducing overfitting. Prior work has suggested that in some cases LS can degrade selective classification (SC) -- where the aim is to rejec… ▽ More

    Submitted 20 February, 2025; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Published as a conference paper at ICLR 2025

  7. arXiv:2311.04764  [pdf, other

    cs.AR

    AutoWS: Automate Weights Streaming in Layer-wise Pipelined DNN Accelerators

    Authors: Zhewen Yu, Christos-Savvas Bouganis

    Abstract: With the great success of Deep Neural Networks (DNN), the design of efficient hardware accelerators has triggered wide interest in the research community. Existing research explores two architectural strategies: sequential layer execution and layer-wise pipelining. While the former supports a wider range of models, the latter is favoured for its enhanced customization and efficiency. A challenge f… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: accepted by DATE2024

  8. arXiv:2309.01587  [pdf, other

    cs.AR cs.CV cs.LG

    SATAY: A Streaming Architecture Toolflow for Accelerating YOLO Models on FPGA Devices

    Authors: Alexander Montgomerie-Corcoran, Petros Toupas, Zhewen Yu, Christos-Savvas Bouganis

    Abstract: AI has led to significant advancements in computer vision and image processing tasks, enabling a wide range of applications in real-life scenarios, from autonomous vehicles to medical imaging. Many of those applications require efficient object detection algorithms and complementary real-time, low latency hardware to perform inference of these algorithms. The YOLO family of models is considered th… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

  9. arXiv:2307.15517  [pdf, other

    cs.AR

    A Dataflow Compiler for Efficient LLM Inference using Custom Microscaling Formats

    Authors: Jianyi Cheng, Cheng Zhang, Zhewen Yu, Christos-Savvas Bouganis, George A. Constantinides, Yiren Zhao

    Abstract: Model quantization represents both parameters (weights) and intermediate values (activations) in a more compact format, thereby directly reducing both computational and memory cost in hardware. The quantization of recent large language models (LLMs) faces challenges to achieve competitive memory density compared to other models such as convolutional neural networks, since values in LLMs require la… ▽ More

    Submitted 19 April, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

  10. arXiv:2307.07821  [pdf, other

    cs.AR

    PASS: Exploiting Post-Activation Sparsity in Streaming Architectures for CNN Acceleration

    Authors: Alexander Montgomerie-Corcoran, Zhewen Yu, Jianyi Cheng, Christos-Savvas Bouganis

    Abstract: With the ever-growing popularity of Artificial Intelligence, there is an increasing demand for more performant and efficient underlying hardware. Convolutional Neural Networks (CNN) are a workload of particular importance, which achieve high accuracy in computer vision applications. Inside CNNs, a significant number of the post-activation values are zero, resulting in many redundant computations.… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

  11. arXiv:2306.05021  [pdf, other

    cs.LG cs.AR

    Mixed-TD: Efficient Neural Network Accelerator with Layer-Specific Tensor Decomposition

    Authors: Zhewen Yu, Christos-Savvas Bouganis

    Abstract: Neural Network designs are quite diverse, from VGG-style to ResNet-style, and from Convolutional Neural Networks to Transformers. Towards the design of efficient accelerators, many works have adopted a dataflow-based, inter-layer pipelined architecture, with a customised hardware towards each layer, achieving ultra high throughput and low latency. The deployment of neural networks to such dataflow… ▽ More

    Submitted 22 June, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: accepted by FPL2023

  12. arXiv:2305.19896  [pdf, other

    cs.AR cs.AI cs.CV cs.LG

    fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAs

    Authors: Petros Toupas, Christos-Savvas Bouganis, Dimitrios Tzovaras

    Abstract: Surveillance systems, autonomous vehicles, human monitoring systems, and video retrieval are just few of the many applications in which 3D Convolutional Neural Networks are exploited. However, their extensive use is restricted by their high computational and memory requirements, especially when integrated into systems with limited resources. This study proposes a toolflow that optimises the mappin… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: 7 pages, 3 figures, 4 tables. arXiv admin note: substantial text overlap with arXiv:2305.18479

  13. arXiv:2305.18479  [pdf, other

    cs.CV cs.AI cs.AR cs.LG

    FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action Recognition

    Authors: Petros Toupas, Christos-Savvas Bouganis, Dimitrios Tzovaras

    Abstract: 3D Convolutional Neural Networks are gaining increasing attention from researchers and practitioners and have found applications in many domains, such as surveillance systems, autonomous vehicles, human monitoring systems, and video retrieval. However, their widespread adoption is hindered by their high computational and memory requirements, especially when resource-constrained systems are targete… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: 8 pages, 6 figures, 2 tables

  14. arXiv:2304.08400  [pdf, other

    cs.AR cs.LG

    ATHEENA: A Toolflow for Hardware Early-Exit Network Automation

    Authors: Benjamin Biggs, Christos-Savvas Bouganis, George A. Constantinides

    Abstract: The continued need for improvements in accuracy, throughput, and efficiency of Deep Neural Networks has resulted in a multitude of methods that make the most of custom architectures on FPGAs. These include the creation of hand-crafted networks and the use of quantization and pruning to reduce extraneous network parameters. However, with the potential of static solutions already well exploited, we… ▽ More

    Submitted 14 April, 2025; v1 submitted 17 April, 2023; originally announced April 2023.

    Journal ref: Proceedings of IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2023, 121-132

  15. HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on FPGA Devices

    Authors: Petros Toupas, Alexander Montgomerie-Corcoran, Christos-Savvas Bouganis, Dimitrios Tzovaras

    Abstract: For Human Action Recognition tasks (HAR), 3D Convolutional Neural Networks have proven to be highly effective, achieving state-of-the-art results. This study introduces a novel streaming architecture based toolflow for mapping such models onto FPGAs considering the model's inherent characteristics and the features of the targeted FPGA device. The HARFLOW3D toolflow takes as input a 3D CNN in ONNX… ▽ More

    Submitted 29 May, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: 11 pages, 8 figures, 6 tables

  16. arXiv:2303.08010  [pdf, other

    cs.LG cs.AI cs.CV

    Window-Based Early-Exit Cascades for Uncertainty Estimation: When Deep Ensembles are More Efficient than Single Models

    Authors: Guoxuan Xia, Christos-Savvas Bouganis

    Abstract: Deep Ensembles are a simple, reliable, and effective method of improving both the predictive performance and uncertainty estimates of deep learning approaches. However, they are widely criticised as being computationally expensive, due to the need to deploy multiple independent models. Recent work has challenged this view, showing that for predictive accuracy, ensembles can be more computationally… ▽ More

    Submitted 9 October, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted to ICCV 2023 (camera-ready version, 9 pages)

  17. arXiv:2208.10404  [pdf, other

    cs.LG

    SVD-NAS: Coupling Low-Rank Approximation and Neural Architecture Search

    Authors: Zhewen Yu, Christos-Savvas Bouganis

    Abstract: The task of compressing pre-trained Deep Neural Networks has attracted wide interest of the research community due to its great benefits in freeing practitioners from data access requirements. In this domain, low-rank approximation is a promising method, but existing solutions considered a restricted number of design choices and failed to efficiently explore the design space, which lead to severe… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

    Comments: Accepted at WACV 2023

  18. arXiv:2207.07517  [pdf, other

    cs.LG cs.AI cs.CV

    On the Usefulness of Deep Ensemble Diversity for Out-of-Distribution Detection

    Authors: Guoxuan Xia, Christos-Savvas Bouganis

    Abstract: The ability to detect Out-of-Distribution (OOD) data is important in safety-critical applications of deep learning. The aim is to separate In-Distribution (ID) data drawn from the training distribution from OOD data using a measure of uncertainty extracted from a deep neural network. Deep Ensembles are a well-established method of improving the quality of uncertainty estimates produced by deep neu… ▽ More

    Submitted 20 September, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

    Comments: Workshop on Uncertainty Quantification for Computer Vision, ECCV 2022

  19. arXiv:2207.07506  [pdf, other

    cs.LG cs.AI cs.CV

    Augmenting Softmax Information for Selective Classification with Out-of-Distribution Data

    Authors: Guoxuan Xia, Christos-Savvas Bouganis

    Abstract: Detecting out-of-distribution (OOD) data is a task that is receiving an increasing amount of research attention in the domain of deep learning for computer vision. However, the performance of detection methods is generally evaluated on the task in isolation, rather than also considering potential downstream tasks in tandem. In this work, we examine selective classification in the presence of OOD d… ▽ More

    Submitted 14 March, 2023; v1 submitted 15 July, 2022; originally announced July 2022.

    Comments: ACCV 2022 (Best Paper Award) https://openaccess.thecvf.com/content/ACCV2022/html/Xia_Augmenting_Softmax_Information_for_Selective_Classification_with_Out-of-Distribution_Data_ACCV_2022_paper.html

  20. arXiv:2205.09376  [pdf, other

    cs.AR cs.AI cs.LG

    Multi-DNN Accelerators for Next-Generation AI Systems

    Authors: Stylianos I. Venieris, Christos-Savvas Bouganis, Nicholas D. Lane

    Abstract: As the use of AI-powered applications widens across multiple domains, so do increase the computational demands. Primary driver of AI technology are the deep neural networks (DNNs). When focusing either on cloud-based systems that serve multiple AI queries from different users each with their own DNN model, or on mobile robots and smartphones employing pipelines of various models or parallel DNNs f… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

    Comments: Accepted for publication at the IEEE Computer journal, 2022

  21. arXiv:2203.00772  [pdf, ps, other

    cs.CV cs.AI

    Low-Cost On-device Partial Domain Adaptation (LoCO-PDA): Enabling efficient CNN retraining on edge devices

    Authors: Aditya Rajagopal, Christos-Savvas Bouganis

    Abstract: With the increased deployment of Convolutional Neural Networks (CNNs) on edge devices, the uncertainty of the observed data distribution upon deployment has led researchers to to utilise large and extensive datasets such as ILSVRC'12 to train CNNs. Consequently, it is likely that the observed data distribution upon deployment is a subset of the training data distribution. In such cases, not adapti… ▽ More

    Submitted 1 March, 2022; originally announced March 2022.

  22. arXiv:2112.00170  [pdf, other

    cs.AR

    SAMO: Optimised Mapping of Convolutional Neural Networks to Streaming Architectures

    Authors: Alexander Montgomerie-Corcoran, Zhewen Yu, Christos-Savvas Bouganis

    Abstract: Significant effort has been placed on the development of toolflows that map Convolutional Neural Network (CNN) models to Field Programmable Gate Arrays (FPGAs) with the aim of automating the production of high performing designs for a diverse set of applications. However, within these toolflows, the problem of finding an optimal mapping is often overlooked, with the expectation that the end user w… ▽ More

    Submitted 9 August, 2022; v1 submitted 30 November, 2021; originally announced December 2021.

  23. arXiv:2108.05580  [pdf, other

    cs.LG cs.AI cs.CV

    perf4sight: A toolflow to model CNN training performance on Edge GPUs

    Authors: Aditya Rajagopal, Christos-Savvas Bouganis

    Abstract: The increased memory and processing capabilities of today's edge devices create opportunities for greater edge intelligence. In the domain of vision, the ability to adapt a Convolutional Neural Network's (CNN) structure and parameters to the input data distribution leads to systems with lower memory footprint, latency and power consumption. However, due to the limited compute resources and memory… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

    Comments: Accepted into the Workshop on Embedded and Real-World Computer Vision in Autonomous Driving (ERCVAD), ICCV 2021

  24. arXiv:2107.10047  [pdf, other

    cs.PF

    Performance landscape of resource-constrained platforms targeting DNNs

    Authors: Panagiotis Miliadis, Christos-Savvas Bouganis, Dionisios Pnevmatikatos

    Abstract: Over the recent years, a significant number of complex, deep neural networks have been developed for a variety of applications including speech and face recognition, computer vision in the areas of health-care, automatic translation, image classification, etc. Moreover, there is an increasing demand in deploying these networks in resource-constrained edge devices. As the computational demands of t… ▽ More

    Submitted 3 November, 2021; v1 submitted 21 July, 2021; originally announced July 2021.

  25. arXiv:2006.13829  [pdf, other

    cs.DC cs.LG

    Caffe Barista: Brewing Caffe with FPGAs in the Training Loop

    Authors: Diederik Adriaan Vink, Aditya Rajagopal, Stylianos I. Venieris, Christos-Savvas Bouganis

    Abstract: As the complexity of deep learning (DL) models increases, their compute requirements increase accordingly. Deploying a Convolutional Neural Network (CNN) involves two phases: training and inference. With the inference task typically taking place on resource-constrained devices, a lot of research has explored the field of low-power inference on custom hardware accelerators. On the other hand, train… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: Published as short paper at FPL2020

  26. arXiv:2006.09049  [pdf, other

    cs.CV cs.LG

    Multi-Precision Policy Enforced Training (MuPPET): A precision-switching strategy for quantised fixed-point training of CNNs

    Authors: Aditya Rajagopal, Diederik Adriaan Vink, Stylianos I. Venieris, Christos-Savvas Bouganis

    Abstract: Large-scale convolutional neural networks (CNNs) suffer from very long training times, spanning from hours to weeks, limiting the productivity and experimentation of deep learning practitioners. As networks grow in size and complexity, training time can be reduced through low-precision data representations and computations. However, in doing so the final accuracy suffers due to the problem of vani… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

    Comments: Accepted at the 37th International Conference on Machine Learning (ICML), 2020

  27. arXiv:2006.08554  [pdf, other

    cs.CV cs.LG

    Now that I can see, I can improve: Enabling data-driven finetuning of CNNs on the edge

    Authors: Aditya Rajagopal, Christos-Savvas Bouganis

    Abstract: In today's world, a vast amount of data is being generated by edge devices that can be used as valuable training data to improve the performance of machine learning algorithms in terms of the achieved accuracy or to reduce the compute requirements of the model. However, due to user data privacy concerns as well as storage and communication bandwidth limitations, this data cannot be moved from the… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    Comments: Accepted for publication at CVPR2020 workshop - Efficient Deep Learning for Computer Vision

  28. arXiv:1905.00689  [pdf, other

    eess.SP cs.LG cs.RO

    Approximate LSTMs for Time-Constrained Inference: Enabling Fast Reaction in Self-Driving Cars

    Authors: Alexandros Kouris, Stylianos I. Venieris, Michail Rizakis, Christos-Savvas Bouganis

    Abstract: The need to recognise long-term dependencies in sequential data such as video streams has made Long Short-Term Memory (LSTM) networks a prominent Artificial Intelligence model for many emerging applications. However, the high computational and memory demands of LSTMs introduce challenges in their deployment on latency-critical systems such as self-driving cars which are equipped with limited compu… ▽ More

    Submitted 30 October, 2019; v1 submitted 2 May, 2019; originally announced May 2019.

    Comments: PREPRINT: Accepted for publication at the IEEE Consumer Electronics Magazine (CEM). [Acceptance Date: 28-Oct-2019]

  29. arXiv:1902.04907  [pdf, other

    cs.RO

    A Scalable FPGA-based Architecture for Depth Estimation in SLAM

    Authors: Konstantinos Boikos, Christos-Savvas Bouganis

    Abstract: The current state of the art of Simultaneous Localisation and Mapping, or SLAM, on low power embedded systems is about sparse localisation and mapping with low resolution results in the name of efficiency. Meanwhile, research in this field has provided many advances for information rich processing and semantic understanding, combined with high computational requirements for real-time processing. T… ▽ More

    Submitted 13 February, 2019; originally announced February 2019.

    Comments: Accepted for publication in the 15th International Symposium on Applied Reconfigurable Computing (ARC 2019). This is a pre-print. The final version will be published by Springer-Verlag in the Lecture Notes in Computer Science series

  30. DroNet: Efficient convolutional neural network detector for real-time UAV applications

    Authors: Christos Kyrkou, George Plastiras, Stylianos Venieris, Theocharis Theocharides, Christos-Savvas Bouganis

    Abstract: Unmanned Aerial Vehicles (drones) are emerging as a promising technology for both environmental and infrastructure monitoring, with broad use in a plethora of applications. Many such applications require the use of computer vision algorithms in order to analyse the information captured from an on-board camera. Such applications include detecting vehicles for emergency response and traffic monitori… ▽ More

    Submitted 18 July, 2018; originally announced July 2018.

    Comments: C. Kyrkou, G. Plastiras, T. Theocharides, S. I. Venieris and C. S. Bouganis, "DroNet: Efficient convolutional neural network detector for real-time UAV applications," 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, 2018, pp. 967-972. Keywords: Convolutional neural networks, Machine learning, autonomous aerial vehicles, computer vision, embedded systems

    Journal ref: 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)

  31. arXiv:1807.05053  [pdf, other

    cs.CV cs.LG

    CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks

    Authors: Alexandros Kouris, Stylianos I. Venieris, Christos-Savvas Bouganis

    Abstract: This work presents CascadeCNN, an automated toolflow that pushes the quantisation limits of any given CNN model, aiming to perform high-throughput inference. A two-stage architecture tailored for any given CNN-FPGA pair is generated, consisting of a low- and high-precision unit in a cascade. A confidence evaluation unit is employed to identify misclassified cases from the excessively low-precision… ▽ More

    Submitted 13 July, 2018; originally announced July 2018.

    Comments: Accepted for publication at the 28th International Conference on Field Programmable Logic & Applications (FPL), 2018

  32. arXiv:1806.08616  [pdf, other

    cs.CV cs.AI cs.LG

    Deploying Deep Neural Networks in the Embedded Space

    Authors: Stylianos I. Venieris, Alexandros Kouris, Christos-Savvas Bouganis

    Abstract: Recently, Deep Neural Networks (DNNs) have emerged as the dominant model across various AI applications. In the era of IoT and mobile systems, the efficient deployment of DNNs on embedded platforms is vital to enable the development of intelligent applications. This paper summarises our recent work on the optimised mapping of DNNs on embedded settings. By covering such diverse topics as DNN-to-acc… ▽ More

    Submitted 22 June, 2018; originally announced June 2018.

    Comments: Accepted at MobiSys18: 2nd International Workshop on Embedded and Mobile Deep Learning (EMDL) 2018

  33. arXiv:1805.10174  [pdf, other

    cs.CV cs.AI cs.AR

    f-CNN$^{\text{x}}$: A Toolflow for Mapping Multi-CNN Applications on FPGAs

    Authors: Stylianos I. Venieris, Christos-Savvas Bouganis

    Abstract: The predictive power of Convolutional Neural Networks (CNNs) has been an integral factor for emerging latency-sensitive applications, such as autonomous drones and vehicles. Such systems employ multiple CNNs, each one trained for a particular task. The efficient mapping of multiple CNNs on a single FPGA device is a challenging task as the allocation of compute resources and external memory bandwid… ▽ More

    Submitted 7 June, 2021; v1 submitted 25 May, 2018; originally announced May 2018.

    Comments: Accepted at the 28th International Conference on Field Programmable Logic & Applications (FPL) 2018

  34. arXiv:1805.08743  [pdf, other

    cs.CV cs.AI cs.LG

    CascadeCNN: Pushing the performance limits of quantisation

    Authors: Alexandros Kouris, Stylianos I. Venieris, Christos-Savvas Bouganis

    Abstract: This work presents CascadeCNN, an automated toolflow that pushes the quantisation limits of any given CNN model, to perform high-throughput inference by exploiting the computation time-accuracy trade-off. Without the need for retraining, a two-stage architecture tailored for any given FPGA device is generated, consisting of a low- and a high-precision unit. A confidence evaluation unit is employed… ▽ More

    Submitted 22 May, 2018; originally announced May 2018.

    Comments: Accepted at SysML Conference 2018

  35. arXiv:1803.05900  [pdf, other

    cs.CV cs.AR cs.LG

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Authors: Stylianos I. Venieris, Alexandros Kouris, Christos-Savvas Bouganis

    Abstract: In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative p… ▽ More

    Submitted 15 March, 2018; originally announced March 2018.

    Comments: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 2018

  36. arXiv:1801.02190  [pdf, other

    cs.CV cs.AR cs.LG

    Approximate FPGA-based LSTMs under Computation Time Constraints

    Authors: Michalis Rizakis, Stylianos I. Venieris, Alexandros Kouris, Christos-Savvas Bouganis

    Abstract: Recurrent Neural Networks and in particular Long Short-Term Memory (LSTM) networks have demonstrated state-of-the-art accuracy in several emerging Artificial Intelligence tasks. However, the models are becoming increasingly demanding in terms of computational and memory load. Emerging latency-sensitive applications including mobile robots and autonomous vehicles often operate under stringent compu… ▽ More

    Submitted 7 January, 2018; originally announced January 2018.

    Comments: Accepted at the 14th International Symposium in Applied Reconfigurable Computing (ARC) 2018

  37. arXiv:1711.08740  [pdf, other

    cs.CV cs.AR cs.LG

    fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs

    Authors: Stylianos I. Venieris, Christos-Savvas Bouganis

    Abstract: In recent years, Convolutional Neural Networks (ConvNets) have become an enabling technology for a wide range of novel embedded Artificial Intelligence systems. Across the range of applications, the performance needs vary significantly, from high-throughput video surveillance to the very low-latency requirements of autonomous cars. In this context, FPGAs can provide a potential platform that can b… ▽ More

    Submitted 23 November, 2017; originally announced November 2017.

    Comments: Accepted at NIPS 2017 Workshop on Machine Learning on the Phone and other Consumer Devices