Skip to main content

Showing 1–47 of 47 results for author: Luk, W

.
  1. arXiv:2506.07069  [pdf, ps, other

    cs.GR cs.AR cs.CV cs.LG

    Accelerating 3D Gaussian Splatting with Neural Sorting and Axis-Oriented Rasterization

    Authors: Zhican Wang, Guanghui He, Dantong Liu, Lingjun Gao, Shell Xu Hu, Chen Zhang, Zhuoran Song, Nicholas Lane, Wayne Luk, Hongxiang Fan

    Abstract: 3D Gaussian Splatting (3DGS) has recently gained significant attention for high-quality and efficient view synthesis, making it widely adopted in fields such as AR/VR, robotics, and autonomous driving. Despite its impressive algorithmic performance, real-time rendering on resource-constrained devices remains a major challenge due to tight power and area budgets. This paper presents an architecture… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: Preprint. Under review

  2. arXiv:2506.06817  [pdf, ps, other

    cs.AR cs.LG cs.NE cs.PF

    ASPO: Constraint-Aware Bayesian Optimization for FPGA-based Soft Processors

    Authors: Haoran Wu, Ce Guo, Wayne Luk, Robert Mullins

    Abstract: Bayesian Optimization (BO) has shown promise in tuning processor design parameters. However, standard BO does not support constraints involving categorical parameters such as types of branch predictors and division circuits. In addition, optimization time of BO grows with processor complexity, which becomes increasingly significant especially for FPGA-based soft processors. This paper introduces A… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: Accepted to International Conference on Field-Programmable Logic and Applications (FPL) 2025

    Journal ref: Proc. Int. Conf. Field-Programmable Logic and Applications (FPL), 2025

  3. arXiv:2505.12942  [pdf, ps, other

    cs.CL cs.AI cs.LG

    A3 : an Analytical Low-Rank Approximation Framework for Attention

    Authors: Jeffrey T. H. Wong, Cheng Zhang, Xinye Cao, Pedro Gimenes, George A. Constantinides, Wayne Luk, Yiren Zhao

    Abstract: Large language models have demonstrated remarkable performance; however, their massive parameter counts make deployment highly expensive. Low-rank approximation offers a promising compression solution, yet existing approaches have two main limitations: (1) They focus on minimizing the output error of individual linear layers, without considering the architectural characteristics of Transformers, a… ▽ More

    Submitted 25 June, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  4. arXiv:2504.14557  [pdf, other

    quant-ph cs.MA

    Enhancing LLM-based Quantum Code Generation with Multi-Agent Optimization and Quantum Error Correction

    Authors: Charlie Campbell, Hao Mark Chen, Wayne Luk, Hongxiang Fan

    Abstract: Multi-agent frameworks with Large Language Models (LLMs) have become promising tools for generating general-purpose programming languages using test-driven development, allowing developers to create more accurate and robust code. However, their potential has not been fully unleashed for domain-specific programming languages, where specific domain exhibits unique optimization opportunities for cust… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: Paper accepted by DAC'25

  5. arXiv:2503.19894  [pdf, other

    quant-ph cs.ET cs.PF

    Versatile Cross-platform Compilation Toolchain for Schrödinger-style Quantum Circuit Simulation

    Authors: Yuncheng Lu, Shuang Liang, Hongxiang Fan, Ce Guo, Wayne Luk, Paul H. J. Kelly

    Abstract: While existing quantum hardware resources have limited availability and reliability, there is a growing demand for exploring and verifying quantum algorithms. Efficient classical simulators for high-performance quantum simulation are critical to meeting this demand. However, due to the vastly varied characteristics of classical hardware, implementing hardware-specific optimizations for different h… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: To appear in DAC 25

  6. arXiv:2503.12649  [pdf, other

    cs.LG cs.AI

    FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization

    Authors: Hao Mark Chen, Shell Xu Hu, Wayne Luk, Timothy Hospedales, Hongxiang Fan

    Abstract: Model merging has emerged as a promising approach for multi-task learning (MTL), offering a data-efficient alternative to conventional fine-tuning. However, with the rapid development of the open-source AI ecosystem and the increasing availability of fine-tuned foundation models, existing model merging methods face two key limitations: (i) They are primarily designed for in-house fine-tuned models… ▽ More

    Submitted 25 March, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

  7. arXiv:2502.05850  [pdf, other

    cs.AR cs.LG

    MetaML-Pro: Cross-Stage Design Flow Automation for Efficient Deep Learning Acceleration

    Authors: Zhiqiang Que, Jose G. F. Coutinho, Ce Guo, Hongxiang Fan, Wayne Luk

    Abstract: This paper presents a unified framework for codifying and automating optimization strategies to efficiently deploy deep neural networks (DNNs) on resource-constrained hardware, such as FPGAs, while maintaining high performance, accuracy, and resource efficiency. Deploying DNNs on such platforms involves addressing the significant challenge of balancing performance, resource usage (e.g., DSPs and L… ▽ More

    Submitted 15 May, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

    Comments: 28 pages, 20 figures

  8. arXiv:2410.19412  [pdf, other

    cs.LG cs.AI cs.CE econ.EM stat.CO

    Robust Time Series Causal Discovery for Agent-Based Model Validation

    Authors: Gene Yu, Ce Guo, Wayne Luk

    Abstract: Agent-Based Model (ABM) validation is crucial as it helps ensuring the reliability of simulations, and causal discovery has become a powerful tool in this context. However, current causal discovery methods often face accuracy and robustness challenges when applied to complex and noisy time series data, which is typical in ABM scenarios. This study addresses these issues by proposing a Robust Cross… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  9. arXiv:2409.05500  [pdf, other

    cs.LG cs.DC cs.PF stat.CO

    Optimizing VarLiNGAM for Scalable and Efficient Time Series Causal Discovery

    Authors: Ziyang Jiao, Ce Guo, Wayne Luk

    Abstract: Causal discovery identifies causal relationships in data, but the task is more complex for multivariate time series due to the computational demands of methods like VarLiNGAM, which combines a Vector Autoregressive Model with a Linear Non-Gaussian Acyclic Model. This study optimizes causal discovery specifically for time series data, which are common in practical applications. Time series causal d… ▽ More

    Submitted 19 September, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

  10. arXiv:2407.05521  [pdf, other

    cs.AR cs.AI

    Accelerating MRI Uncertainty Estimation with Mask-based Bayesian Neural Network

    Authors: Zehuan Zhang, Matej Genci, Hongxiang Fan, Andreas Wetscherek, Wayne Luk

    Abstract: Accurate and reliable Magnetic Resonance Imaging (MRI) analysis is particularly important for adaptive radiotherapy, a recent medical advance capable of improving cancer diagnosis and treatment. Recent studies have shown that IVIM-NET, a deep neural network (DNN), can achieve high accuracy in MRI analysis, indicating the potential of deep learning to enhance diagnostic capabilities in healthcare.… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: The 35th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP) 2024

  11. arXiv:2407.01475  [pdf, other

    cs.AR cs.LG

    Exploring FPGA designs for MX and beyond

    Authors: Ebby Samson, Naveen Mellempudi, Wayne Luk, George A. Constantinides

    Abstract: A number of companies recently worked together to release the new Open Compute Project MX standard for low-precision computation, aimed at efficient neural network implementation. In this paper, we describe and evaluate the first open-source FPGA implementation of the arithmetic defined in the standard. Our designs fully support all the standard's concrete formats for conversion into and out of MX… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures

  12. arXiv:2406.16198  [pdf, other

    cs.LG cs.AR

    Hardware-Aware Neural Dropout Search for Reliable Uncertainty Prediction on FPGA

    Authors: Zehuan Zhang, Hongxiang Fan, Hao Mark Chen, Lukasz Dudziak, Wayne Luk

    Abstract: The increasing deployment of artificial intelligence (AI) for critical decision-making amplifies the necessity for trustworthy AI, where uncertainty estimation plays a pivotal role in ensuring trustworthiness. Dropout-based Bayesian Neural Networks (BayesNNs) are prominent in this field, offering reliable uncertainty estimates. Despite their effectiveness, existing dropout-based BayesNNs typically… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Design Automation Conference (DAC) 2024

  13. arXiv:2406.14593  [pdf, other

    cs.LG

    Enhancing Dropout-based Bayesian Neural Networks with Multi-Exit on FPGA

    Authors: Hao Mark Chen, Liam Castelli, Martin Ferianc, Hongyu Zhou, Shuanglong Liu, Wayne Luk, Hongxiang Fan

    Abstract: Reliable uncertainty estimation plays a crucial role in various safety-critical applications such as medical diagnosis and autonomous driving. In recent years, Bayesian neural networks (BayesNNs) have gained substantial research and industrial interests due to their capability to make accurate predictions with reliable uncertainty estimation. However, the algorithmic complexity and the resulting h… ▽ More

    Submitted 24 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2308.06849

  14. arXiv:2405.18628  [pdf, other

    cs.LG cs.CL

    Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference

    Authors: Hao Mark Chen, Wayne Luk, Ka Fai Cedric Yiu, Rui Li, Konstantin Mishchenko, Stylianos I. Venieris, Hongxiang Fan

    Abstract: The auto-regressive decoding of Large Language Models (LLMs) results in significant overheads in their hardware performance. While recent research has investigated various speculative decoding techniques for multi-token generation, these efforts have primarily focused on improving processing speed such as throughput. Crucially, they often neglect other metrics essential for real-life deployments,… ▽ More

    Submitted 2 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: The code for this implementation is available at https://github.com/hmarkc/parallel-prompt-decoding

  15. arXiv:2405.18168  [pdf, other

    cs.AR cs.DB

    Sorting-based FPGA Sliding Window Aggregation Engine without off-chip Memories

    Authors: Philippos Papaphilippou, Wayne Luk, David Gregg

    Abstract: Aggregation queries are a series of computationally-demanding analytics operations on grouped and time series data. They include tasks such as summation or finding the median among the items of a group sharing a group ID, and within a specified number of the last observed tuples for sliding window aggregation (SWAG). They have a wide range of applications including in database analytics, operating… ▽ More

    Submitted 30 November, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  16. arXiv:2402.01876  [pdf, other

    hep-ex cs.LG physics.ins-det

    Ultrafast jet classification on FPGAs for the HL-LHC

    Authors: Patrick Odagiu, Zhiqiang Que, Javier Duarte, Johannes Haller, Gregor Kasieczka, Artur Lobanov, Vladimir Loncar, Wayne Luk, Jennifer Ngadiuba, Maurizio Pierini, Philipp Rincke, Arpita Seksaria, Sioni Summers, Andre Sznajder, Alexander Tapper, Thea K. Aarrestad

    Abstract: Three machine learning models are used to perform jet origin classification. These models are optimized for deployment on a field-programmable gate array device. In this context, we demonstrate how latency and resource consumption scale with the input size and choice of algorithm. Moreover, the models proposed here are designed to work on the type of data and under the foreseen conditions at the C… ▽ More

    Submitted 4 July, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: 13 pages, 3 figures, 3 tables. Mach. Learn.: Sci. Technol (2024)

    Report number: FERMILAB-PUB-24-0030-CMS-CSAID-PPD

  17. Deeper Hedging: A New Agent-based Model for Effective Deep Hedging

    Authors: Kang Gao, Stephen Weston, Perukrishnen Vytelingum, Namid R. Stillman, Wayne Luk, Ce Guo

    Abstract: We propose the Chiarella-Heston model, a new agent-based model for improving the effectiveness of deep hedging strategies. This model includes momentum traders, fundamental traders, and volatility traders. The volatility traders participate in the market by innovatively following a Heston-style volatility signal. The proposed model generalises both the extended Chiarella model and the Heston stoch… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: Accepted in the 4th ACM International Conference on AI in Finance (ICAIF'23)

  18. arXiv:2308.06849  [pdf, other

    cs.LG cs.AR

    When Monte-Carlo Dropout Meets Multi-Exit: Optimizing Bayesian Neural Networks on FPGA

    Authors: Hongxiang Fan, Hao Chen, Liam Castelli, Zhiqiang Que, He Li, Kenneth Long, Wayne Luk

    Abstract: Bayesian Neural Networks (BayesNNs) have demonstrated their capability of providing calibrated prediction for safety-critical applications such as medical imaging and autonomous driving. However, the high algorithmic complexity and the poor hardware performance of BayesNNs hinder their deployment in real-life applications. To bridge this gap, this paper proposes a novel multi-exit Monte-Carlo Drop… ▽ More

    Submitted 13 August, 2023; originally announced August 2023.

  19. MetaML: Automating Customizable Cross-Stage Design-Flow for Deep Learning Acceleration

    Authors: Zhiqiang Que, Shuo Liu, Markus Rognlien, Ce Guo, Jose G. F. Coutinho, Wayne Luk

    Abstract: This paper introduces a novel optimization framework for deep neural network (DNN) hardware accelerators, enabling the rapid development of customized and automated design flows. More specifically, our approach aims to automate the selection and configuration of low-level optimization techniques, encompassing DNN and FPGA low-level optimizations. We introduce novel optimization and transformation… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: 5 pages, Accepted at FPL'23

  20. arXiv:2209.14065  [pdf, other

    cs.AR cs.LG physics.ins-det

    LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics

    Authors: Zhiqiang Que, Hongxiang Fan, Marcus Loo, He Li, Michaela Blott, Maurizio Pierini, Alexander Tapper, Wayne Luk

    Abstract: This work presents a novel reconfigurable architecture for Low Latency Graph Neural Network (LL-GNN) designs for particle detectors, delivering unprecedented low latency performance. Incorporating FPGA-based GNNs into particle detectors presents a unique challenge since it requires sub-microsecond latency to deploy the networks for online event selection with a data rate of hundreds of terabytes p… ▽ More

    Submitted 9 January, 2024; v1 submitted 28 September, 2022; originally announced September 2022.

    Comments: This paper has been accepted by ACM Transactions on Embedded Computing Systems (TECS)

  21. arXiv:2209.09570  [pdf, other

    cs.AR cs.LG

    Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design

    Authors: Hongxiang Fan, Thomas Chau, Stylianos I. Venieris, Royson Lee, Alexandros Kouris, Wayne Luk, Nicholas D. Lane, Mohamed S. Abdelfattah

    Abstract: Attention-based neural networks have become pervasive in many AI tasks. Despite their excellent algorithmic performance, the use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources, which often compromises their hardware performance. Although various sparse variants have been introduced, most approaches only focus on mitigating the quadrat… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

    Comments: Paper accepted by MICRO'22

  22. arXiv:2208.14207  [pdf, other

    q-fin.CP q-fin.GN

    Understanding intra-day price formation process by agent-based financial market simulation: calibrating the extended chiarella model

    Authors: Kang Gao, Perukrishnen Vytelingum, Stephen Weston, Wayne Luk, Ce Guo

    Abstract: This article presents XGB-Chiarella, a powerful new approach for deploying agent-based models to generate realistic intra-day artificial financial price data. This approach is based on agent-based models, calibrated by XGBoost machine learning surrogate. Following the Extended Chiarella model, three types of trading agents are introduced in this agent-based model: fundamental traders, momentum tra… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

    Comments: Published in WILMOTT Magazine: May 2022 issue. arXiv admin note: text overlap with arXiv:2208.13654

    Journal ref: Understanding intra-day price formation process by agent-based financial market simulation: calibrating the extended chiarella model, Wilmott, vol. 2022, iss. 119, p. 22-38, 2022

  23. arXiv:2208.13654  [pdf, other

    q-fin.TR q-fin.CP

    High-frequency financial market simulation and flash crash scenarios analysis: an agent-based modelling approach

    Authors: Kang Gao, Perukrishnen Vytelingum, Stephen Weston, Wayne Luk, Ce Guo

    Abstract: This paper describes simulations and analysis of flash crash scenarios in an agent-based modelling framework. We design, implement, and assess a novel high-frequency agent-based financial market simulator that generates realistic millisecond-level financial price time series for the E-Mini S&P 500 futures market. Specifically, a microstructure model of a single security traded on a central limit o… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

    Journal ref: Journal of Artificial Societies and Social Simulation 2024 27 (2) 8 <http://jasss.soc.surrey.ac.uk/27/2/8.html>

  24. FLiMS: a Fast Lightweight 2-way Merger for Sorting

    Authors: Philippos Papaphilippou, Wayne Luk, Chris Brooks

    Abstract: In this paper, we present FLiMS, a highly-efficient and simple parallel algorithm for merging two sorted lists residing in banked and/or wide memory. On FPGAs, its implementation uses fewer hardware resources than the state-of-the-art alternatives, due to the reduced number of comparators and elimination of redundant logic found on prior attempts. In combination with the distributed nature of the… ▽ More

    Submitted 7 March, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: Accepted for publication in the IEEE Transactions on Computers. Extension of the similarly-named FPT 2018 paper, with additional functionality variations, comparisons and experiments

  25. arXiv:2111.12787  [pdf, other

    cs.LG cs.AR eess.SY

    Algorithm and Hardware Co-design for Reconfigurable CNN Accelerator

    Authors: Hongxiang Fan, Martin Ferianc, Zhiqiang Que, He Li, Shuanglong Liu, Xinyu Niu, Wayne Luk

    Abstract: Recent advances in algorithm-hardware co-design for deep neural networks (DNNs) have demonstrated their potential in automatically designing neural architectures and hardware designs. Nevertheless, it is still a challenging optimization problem due to the expensive training cost and the time-consuming hardware implementation, which makes the exploration on the vast design space of neural architect… ▽ More

    Submitted 24 November, 2021; originally announced November 2021.

  26. arXiv:2106.14089  [pdf, other

    cs.LG cs.AR physics.ins-det

    Accelerating Recurrent Neural Networks for Gravitational Wave Experiments

    Authors: Zhiqiang Que, Erwei Wang, Umar Marikar, Eric Moreno, Jennifer Ngadiuba, Hamza Javed, Bartłomiej Borzyszkowski, Thea Aarrestad, Vladimir Loncar, Sioni Summers, Maurizio Pierini, Peter Y Cheung, Wayne Luk

    Abstract: This paper presents novel reconfigurable architectures for reducing the latency of recurrent neural networks (RNNs) that are used for detecting gravitational waves. Gravitational interferometers such as the LIGO detectors capture cosmic events such as black hole mergers which happen at unknown times and of varying durations, producing time-series data. We have developed a new architecture capable… ▽ More

    Submitted 26 June, 2021; originally announced June 2021.

    Comments: Accepted at the 2021 32nd IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP)

  27. arXiv:2106.07456  [pdf, other

    cs.AR

    Extending the RISC-V ISA for exploring advanced reconfigurable SIMD instructions

    Authors: Philippos Papaphilippou, Paul H. J. Kelly, Wayne Luk

    Abstract: This paper presents a novel, non-standard set of vector instruction types for exploring custom SIMD instructions in a softcore. The new types allow simultaneous access to a relatively high number of operands, reducing the instruction count where applicable. Additionally, a high-performance open-source RISC-V (RV32 IM) softcore is introduced, optimised for exploring custom SIMD instructions and str… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

    Comments: Accepted at the Fifth Workshop on Computer Architecture Research with RISC-V (CARRV 2021), co-located with ISCA 2021

  28. arXiv:2106.06048  [pdf, other

    cs.LG

    Optimizing Bayesian Recurrent Neural Networks on an FPGA-based Accelerator

    Authors: Martin Ferianc, Zhiqiang Que, Hongxiang Fan, Wayne Luk, Miguel Rodrigues

    Abstract: Neural networks have demonstrated their outstanding performance in a wide range of tasks. Specifically recurrent architectures based on long-short term memory (LSTM) cells have manifested excellent capability to model time dependencies in real-world data. However, standard recurrent architectures cannot estimate their uncertainty which is essential for safety-critical applications such as in medic… ▽ More

    Submitted 7 November, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: Accepted to FPT'21. Martin Ferianc and Zhiqiang Que share an equal contribution. Updated copyright footer

  29. arXiv:2105.09163  [pdf, other

    cs.AR cs.LG eess.IV

    High-Performance FPGA-based Accelerator for Bayesian Neural Networks

    Authors: Hongxiang Fan, Martin Ferianc, Miguel Rodrigues, Hongyu Zhou, Xinyu Niu, Wayne Luk

    Abstract: Neural networks (NNs) have demonstrated their potential in a wide range of applications such as image recognition, decision making or recommendation systems. However, standard NNs are unable to capture their model uncertainty which is crucial for many safety-critical applications including healthcare and autonomous vehicles. In comparison, Bayesian neural networks (BNNs) are able to express uncert… ▽ More

    Submitted 30 November, 2021; v1 submitted 12 May, 2021; originally announced May 2021.

    Comments: Design Automation Conference (DAC) 2021

  30. arXiv:2009.02825  [pdf, other

    cs.LG stat.ML

    An Analysis of Alternating Direction Method of Multipliers for Feed-forward Neural Networks

    Authors: Seyedeh Niusha Alavi Foumani, Ce Guo, Wayne Luk

    Abstract: In this work, we present a hardware compatible neural network training algorithm in which we used alternating direction method of multipliers (ADMM) and iterative least-square methods. The motive behind this approach was to conduct a method of training neural networks that is scalable and can be parallelised. These characteristics make this algorithm suitable for hardware implementation. We have a… ▽ More

    Submitted 6 September, 2020; originally announced September 2020.

  31. arXiv:2009.02784  [pdf, other

    cs.LG

    An FPGA Accelerated Method for Training Feed-forward Neural Networks Using Alternating Direction Method of Multipliers and LSMR

    Authors: Seyedeh Niusha Alavi Foumani, Ce Guo, Wayne Luk

    Abstract: In this project, we have successfully designed, implemented, deployed and tested a novel FPGA accelerated algorithm for neural network training. The algorithm itself was developed in an independent study option. This training method is based on Alternating Direction Method of Multipliers algorithm, which has strong parallel characteristics and avoids procedures such as matrix inversion that are pr… ▽ More

    Submitted 6 September, 2020; originally announced September 2020.

  32. arXiv:2006.00493  [pdf, other

    physics.acc-ph physics.bio-ph

    The Laser-hybrid Accelerator for Radiobiological Applications

    Authors: G. Aymar, T. Becker, S. Boogert, M. Borghesi, R. Bingham, C. Brenner, P. N. Burrows, T. Dascalu, O. C. Ettlinger, S. Gibson, T. Greenshaw, S. Gruber, D. Gujral, C. Hardiman, J. Hughes, W. G. Jones, K. Kirkby, A. Kurup, J-B. Lagrange, K. Long, W. Luk, J. Matheson, P. McKenna, R. Mclauchlan, Z. Najmudin , et al. (15 additional authors not shown)

    Abstract: The `Laser-hybrid Accelerator for Radiobiological Applications', LhARA, is conceived as a novel, uniquely-flexible facility dedicated to the study of radiobiology. The technologies demonstrated in LhARA, which have wide application, will be developed to allow particle-beam therapy to be delivered in a completely new regime, combining a variety of ion species in a single treatment fraction and expl… ▽ More

    Submitted 31 May, 2020; originally announced June 2020.

    Comments: 36 pages, 11 figures, preprint submitted to Frontiers in Physics, Medical Physics and Imaging

  33. arXiv:2002.00190  [pdf, ps, other

    eess.IV

    Improving Performance Estimation for FPGA-based Accelerators for Convolutional Neural Networks

    Authors: Martin Ferianc, Hongxiang Fan, Ringo S. W. Chu, Jakub Stano, Wayne Luk

    Abstract: Field-programmable gate array (FPGA) based accelerators are being widely used for acceleration of convolutional neural networks (CNNs) due to their potential in improving the performance and reconfigurability for specific application instances. To determine the optimal configuration of an FPGA-based accelerator, it is necessary to explore the design space and an accurate performance prediction pla… ▽ More

    Submitted 1 February, 2020; originally announced February 2020.

    Comments: This article is accepted for publication at ARC'2020

  34. arXiv:2001.10605  [pdf

    cs.NE eess.AS q-bio.NC

    Learning spatial hearing via innate mechanisms

    Authors: Yang Chu, Wayne Luk, Dan Goodman

    Abstract: The acoustic cues used by humans and other animals to localise sounds are subtle, and change during and after development. This means that we need to constantly relearn or recalibrate the auditory spatial map throughout our lifetimes. This is often thought of as a "supervised" learning process where a "teacher" (for example, a parent, or your visual system) tells you whether or not you guessed the… ▽ More

    Submitted 16 April, 2025; v1 submitted 28 January, 2020; originally announced January 2020.

  35. arXiv:1908.07516  [pdf, other

    eess.IV cs.CV physics.med-ph

    DirectPET: Full Size Neural Network PET Reconstruction from Sinogram Data

    Authors: William Whiteley, Wing K. Luk, Jens Gregor

    Abstract: Purpose: Neural network image reconstruction directly from measurement data is a relatively new field of research, that until now has been limited to producing small single-slice images (e.g., 1x128x128). This paper proposes a novel and more efficient network design for Positron Emission Tomography called DirectPET which is capable of reconstructing multi-slice image volumes (i.e., 16x400x400) fro… ▽ More

    Submitted 11 February, 2020; v1 submitted 19 August, 2019; originally announced August 2019.

    Comments: Submitted to the Journal of Medical Imaging

  36. arXiv:1906.11981  [pdf, other

    cs.CV cs.LG eess.IV

    Convolution Based Spectral Partitioning Architecture for Hyperspectral Image Classification

    Authors: Ringo S. W. Chu, Ho-Cheung Ng, Xiwei Wang, Wayne Luk

    Abstract: Hyperspectral images (HSIs) can distinguish materials with high number of spectral bands, which is widely adopted in remote sensing applications and benefits in high accuracy land cover classifications. However, HSIs processing are tangled with the problem of high dimensionality and limited amount of labelled data. To address these challenges, this paper proposes a deep learning architecture using… ▽ More

    Submitted 27 June, 2019; originally announced June 2019.

    Comments: Accepted for publication in IGARSS'2019

  37. arXiv:1906.11834  [pdf, other

    eess.IV cs.CV

    Optimizing CNN-based Hyperspectral Image Classification on FPGAs

    Authors: Shuanglong Liu, Ringo S. W. Chu, Xiwei Wang, Wayne Luk

    Abstract: Hyperspectral image (HSI) classification has been widely adopted in applications involving remote sensing imagery analysis which require high classification accuracy and real-time processing speed. Methods based on Convolutional neural networks (CNNs) have been proven to achieve state-of-the-art accuracy in classifying HSIs. However, CNN models are often too computationally intensive to achieve re… ▽ More

    Submitted 27 June, 2019; originally announced June 2019.

    Comments: This article is accepted for publication at ARC'2019

  38. arXiv:1906.00877  [pdf, other

    cs.AR

    Pangloss: a novel Markov chain prefetcher

    Authors: Philippos Papaphilippou, Paul H. J. Kelly, Wayne Luk

    Abstract: We present Pangloss, an efficient high-performance data prefetcher that approximates a Markov chain on delta transitions. With a limited information scope and space/logic complexity, it is able to reconstruct a variety of both simple and complex access patterns. This is achieved by a highly-efficient representation of the Markov chain to provide accurate values for transition probabilities. In add… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: Accepted in The Third Data Prefetching Championship (DPC3), held in conjunction with ISCA 2019

  39. Deep Neural Network Approximation for Custom Hardware: Where We've Been, Where We're Going

    Authors: Erwei Wang, James J. Davis, Ruizhe Zhao, Ho-Cheung Ng, Xinyu Niu, Wayne Luk, Peter Y. K. Cheung, George A. Constantinides

    Abstract: Deep neural networks have proven to be particularly effective in visual and audio recognition tasks. Existing models tend to be computationally expensive and memory intensive, however, and so methods for hardware-oriented approximation have become a hot topic. Research has shown that custom hardware-based neural network accelerators can surpass their general-purpose processor equivalents in terms… ▽ More

    Submitted 8 July, 2019; v1 submitted 21 January, 2019; originally announced January 2019.

    Comments: Accepted manuscript uploaded 21/01/19. DOA 15/01/19

    Journal ref: ACM Comput. Surv. 52, 2, Article 40 (May 2019), 39 pages

  40. arXiv:1811.09341  [pdf, other

    cs.CV

    Efficient Structured Pruning and Architecture Searching for Group Convolution

    Authors: Ruizhe Zhao, Wayne Luk

    Abstract: Efficient inference of Convolutional Neural Networks is a thriving topic recently. It is desirable to achieve the maximal test accuracy under given inference budget constraints when deploying a pre-trained model. Network pruning is a commonly used technique while it may produce irregular sparse models that can hardly gain actual speed-up. Group convolution is a promising pruning target due to its… ▽ More

    Submitted 28 October, 2019; v1 submitted 22 November, 2018; originally announced November 2018.

    Comments: Published as an ICCV'19 NEUARCH workshop paper

  41. arXiv:1809.03318  [pdf, other

    cs.CV

    Towards Efficient Convolutional Neural Network for Domain-Specific Applications on FPGA

    Authors: Ruizhe Zhao, Ho-Cheung Ng, Wayne Luk, Xinyu Niu

    Abstract: FPGA becomes a popular technology for implementing Convolutional Neural Network (CNN) in recent years. Most CNN applications on FPGA are domain-specific, e.g., detecting objects from specific categories, in which commonly-used CNN models pre-trained on general datasets may not be efficient enough. This paper presents TuRF, an end-to-end CNN acceleration framework to efficiently deploy domain-speci… ▽ More

    Submitted 4 September, 2018; originally announced September 2018.

  42. arXiv:1509.09038  [pdf, other

    physics.ins-det hep-ex

    Measurement of Cosmic-ray Muons and Muon-induced Neutrons in the Aberdeen Tunnel Underground Laboratory

    Authors: S. C. Blyth, Y. L. Chan, X. C. Chen, M. C. Chu, K. X. Cui, R. L. Hahn, T. H. Ho, Y. K. Hor, Y. B. Hsiung, B. Z. Hu, K. K. Kwan, M. W. Kwok, T. Kwok, Y. P. Lau, K. P. Lee, J. K. C. Leung, K. Y. Leung, G. L. Lin, Y. C. Lin, K. B. Luk, W. H. Luk, H. Y. Ngai, W. K. Ngai, S. Y. Ngan, C. S. J. Pun , et al. (9 additional authors not shown)

    Abstract: We have measured the muon flux and production rate of muon-induced neutrons at a depth of 611 m water equivalent. Our apparatus comprises three layers of crossed plastic scintillator hodoscopes for tracking the incident cosmic-ray muons and 760 L of gadolinium-doped liquid scintillator for producing and detecting neutrons. The vertical muon intensity was measured to be… ▽ More

    Submitted 26 November, 2016; v1 submitted 30 September, 2015; originally announced September 2015.

    Comments: 14 pages, 17 figures, 3 tables

    Journal ref: Phys. Rev. D 93, 072005 (2016)

  43. arXiv:1506.06684  [pdf

    cs.DC

    Seeing Shapes in Clouds: On the Performance-Cost trade-off for Heterogeneous Infrastructure-as-a-Service

    Authors: Gordon Inggs, David B. Thomas, George Constantinides, Wayne Luk

    Abstract: In the near future FPGAs will be available by the hour, however this new Infrastructure as a Service (IaaS) usage mode presents both an opportunity and a challenge: The opportunity is that programmers can potentially trade resources for performance on a much larger scale, for much shorter periods of time than before. The challenge is in finding and traversing the trade-off for heterogeneous IaaS t… ▽ More

    Submitted 27 August, 2015; v1 submitted 22 June, 2015; originally announced June 2015.

    Comments: Presented at Second International Workshop on FPGAs for Software Programmers (FSP 2015) (arXiv:1508.06320)

    Report number: FSP/2015/10

  44. arXiv:1505.04417  [pdf, other

    cs.DC cs.CE

    A Domain Specific Approach to High Performance Heterogeneous Computing

    Authors: Gordon Inggs, David B. Thomas, Wayne Luk

    Abstract: Users of heterogeneous computing systems face two problems: firstly, in understanding the trade-off relationships between the observable characteristics of their applications, such as latency and quality of the result, and secondly, how to exploit knowledge of these characteristics to allocate work to distributed computing platforms efficiently. A domain specific approach addresses both of these p… ▽ More

    Submitted 14 March, 2016; v1 submitted 17 May, 2015; originally announced May 2015.

    Comments: 14 pages, preprint draft, minor revision

  45. arXiv:1408.4965  [pdf

    cs.CE cs.DC cs.PF cs.PL

    A Domain Specific Approach to Heterogeneous Computing: From Availability to Accessibility

    Authors: Gordon Inggs, David Thomas, Wayne Luk

    Abstract: We advocate a domain specific software development methodology for heterogeneous computing platforms such as Multicore CPUs, GPUs and FPGAs. We argue that three specific benefits are realised from adopting such an approach: portable, efficient implementations across heterogeneous platforms; domain specific metrics of quality that characterise platforms in a form software developers will understand… ▽ More

    Submitted 21 August, 2014; originally announced August 2014.

    Comments: Presented at First International Workshop on FPGAs for Software Programmers (FSP 2014) (arXiv:1408.4423)

    Report number: FSP/2014/11

  46. arXiv:1308.2924  [pdf, other

    physics.ins-det hep-ex

    An apparatus for studying spallation neutrons in the Aberdeen Tunnel laboratory

    Authors: S. C. Blyth, Y. L. Chan, X. C. Chen, M. C. Chu, R. L. Hahn, T. H. Ho, Y. B. Hsiung, B. Z. Hu, K. K. Kwan, M. W. Kwok, T. Kwok, Y. P. Lau, K. P. Lee, J. K. C. Leung, K. Y. Leung, G. L. Lin, Y. C. Lin, K. B. Luk, W. H. Luk, H. Y. Ngai, S. Y. Ngan, C. S. J. Pun, K. Shih, Y. H. Tam, R. H. M. Tsang , et al. (6 additional authors not shown)

    Abstract: In this paper, we describe the design, construction and performance of an apparatus installed in the Aberdeen Tunnel laboratory in Hong Kong for studying spallation neutrons induced by cosmic-ray muons under a vertical rock overburden of 611 meter water equivalent (m.w.e.). The apparatus comprises of six horizontal layers of plastic-scintillator hodoscopes for determining the direction and positio… ▽ More

    Submitted 13 August, 2013; originally announced August 2013.

    Journal ref: Nuclear Inst. and Methods in Physics Research, A (2013), pp. 67-82

  47. arXiv:0710.4845  [pdf

    cs.AR

    Evaluation of SystemC Modelling of Reconfigurable Embedded Systems

    Authors: Tero Rissa, Adam Donlin, Wayne Luk

    Abstract: This paper evaluates the use of pin and cycle accurate SystemC models for embedded system design exploration and early software development. The target system is MicroBlaze VanillaNet Platform running MicroBlaze uClinux operating system. The paper compares Register Transfer Level (RTL) Hardware Description Language (HDL) simulation speed to the simulation speed of several different SystemC model… ▽ More

    Submitted 25 October, 2007; originally announced October 2007.

    Comments: Submitted on behalf of EDAA (http://www.edaa.com/)

    Journal ref: Dans Design, Automation and Test in Europe | Designers'Forum - DATE'05, Munich : Allemagne (2005)