Instruction Scheduling in the Saturn Vector Unit
Authors:
Jerry Zhao,
Daniel Grubb,
Miles Rusch,
Tianrui Wei,
Kevin Anderson,
Borivoje Nikolic,
Krste Asanovic
Abstract:
While the challenges and solutions for efficient execution of scalable vector ISAs on long-vector-length microarchitectures have been well established, not all of these solutions are suitable for short-vector-length implementations. This work proposes a novel microarchitecture for instruction sequencing in vector units with short architectural vector lengths. The proposed microarchitecture support…
▽ More
While the challenges and solutions for efficient execution of scalable vector ISAs on long-vector-length microarchitectures have been well established, not all of these solutions are suitable for short-vector-length implementations. This work proposes a novel microarchitecture for instruction sequencing in vector units with short architectural vector lengths. The proposed microarchitecture supports fine-granularity chaining, multi-issue out-of-order execution, zero dead-time, and run-ahead memory accesses with low area or complexity costs. We present the Saturn Vector Unit, a RTL implementation of a RVV vector unit. With our instruction scheduling mechanism, Saturn exhibits comparable or superior power, performance, and area characteristics compared to state-of-the-art long-vector and short-vector implementations.
△ Less
Submitted 1 December, 2024;
originally announced December 2024.
Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration
Authors:
Hasan Genc,
Seah Kim,
Alon Amid,
Ameer Haj-Ali,
Vighnesh Iyer,
Pranav Prakash,
Jerry Zhao,
Daniel Grubb,
Harrison Liew,
Howard Mao,
Albert Ou,
Colin Schmidt,
Samuel Steffl,
John Wright,
Ion Stoica,
Jonathan Ragan-Kelley,
Krste Asanovic,
Borivoje Nikolic,
Yakun Sophia Shao
Abstract:
DNN accelerators are often developed and evaluated in isolation without considering the cross-stack, system-level effects in real-world environments. This makes it difficult to appreciate the impact of System-on-Chip (SoC) resource contention, OS overheads, and programming-stack inefficiencies on overall performance/energy-efficiency. To address this challenge, we present Gemmini, an open-source*,…
▽ More
DNN accelerators are often developed and evaluated in isolation without considering the cross-stack, system-level effects in real-world environments. This makes it difficult to appreciate the impact of System-on-Chip (SoC) resource contention, OS overheads, and programming-stack inefficiencies on overall performance/energy-efficiency. To address this challenge, we present Gemmini, an open-source*, full-stack DNN accelerator generator. Gemmini generates a wide design-space of efficient ASIC accelerators from a flexible architectural template, together with flexible programming stacks and full SoCs with shared resources that capture system-level effects. Gemmini-generated accelerators have also been fabricated, delivering up to three orders-of-magnitude speedups over high-performance CPUs on various DNN benchmarks.
* https://github.com/ucb-bar/gemmini
△ Less
Submitted 9 July, 2021; v1 submitted 22 November, 2019;
originally announced November 2019.