Efficiency, Expressivity, and Extensibility in a Close-to-Metal NPU Programming Interface
Authors:
Erika Hunhoff,
Joseph Melber,
Kristof Denolf,
Andra Bisca,
Samuel Bayliss,
Stephen Neuendorffer,
Jeff Fifield,
Jack Lo,
Pranathi Vasireddy,
Phil James-Roxby,
Eric Keller
Abstract:
Accelerators such as neural processing units (NPUs) deliver an enticing balance of performance and efficiency compared to general purpose compute architectures. However, effectively leveraging accelerator capabilities is not always simple: low-level programming toolkits may require substantial developer effort while high-level programming toolkits may abstract critical optimization features.
Thi…
▽ More
Accelerators such as neural processing units (NPUs) deliver an enticing balance of performance and efficiency compared to general purpose compute architectures. However, effectively leveraging accelerator capabilities is not always simple: low-level programming toolkits may require substantial developer effort while high-level programming toolkits may abstract critical optimization features.
This work aims to increase efficiency of designers using IRON, a toolkit for close-to-metal NPU performance engineers. We provide an updated programmer interface to IRON containing new and refined programming constructs. The new interface includes extensible features for placement and data transformation. These contributions are evaluated in terms of 1) efficiency, with analysis showing ~26% average reduction in lines of code and decreases in Halstead metrics for a variety of designs; 2) expressivity, demonstrating the new interface supports the wide range of features and patterns already supported by IRON; and 3) extensibility, illustrating the new tooling for placement and tiling can be extended to accommodate common use-cases.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
Proactive Serverless Function Resource Management
Authors:
Erika Hunhoff,
Shazal Irshad,
Vijay Thurimella,
Ali Tariq,
Eric Rozner
Abstract:
This paper introduces a new primitive to serverless language runtimes called freshen. With freshen, developers or providers specify functionality to perform before a given function executes. This proactive technique allows for overheads associated with serverless functions to be mitigated at execution time, which improves function responsiveness. We show various predictive opportunities exist to r…
▽ More
This paper introduces a new primitive to serverless language runtimes called freshen. With freshen, developers or providers specify functionality to perform before a given function executes. This proactive technique allows for overheads associated with serverless functions to be mitigated at execution time, which improves function responsiveness. We show various predictive opportunities exist to run freshen within reasonable time windows. A high-level design and implementation are described, along with preliminary results to show the potential benefits of our scheme.
△ Less
Submitted 8 October, 2020;
originally announced October 2020.