Skip to main content

Showing 1–6 of 6 results for author: Fujiki, D

Searching in archive cs. Search in all archives.
.
  1. DX100: A Programmable Data Access Accelerator for Indirection

    Authors: Alireza Khadem, Kamalavasan Kamalakkannan, Zhenyan Zhu, Akash Poptani, Yufeng Gu, Jered Benjamin Dominguez-Trujillo, Nishil Talati, Daichi Fujiki, Scott Mahlke, Galen Shipman, Reetuparna Das

    Abstract: Indirect memory accesses frequently appear in applications where memory bandwidth is a critical bottleneck. Prior indirect memory access proposals, such as indirect prefetchers, runahead execution, fetchers, and decoupled access/execute architectures, primarily focus on improving memory access latency by loading data ahead of computation but still rely on the DRAM controllers to reorder memory req… ▽ More

    Submitted 2 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

    Comments: Proceedings of the 52nd Annual International Symposium on Computer Architecture (ISCA 2025)

  2. arXiv:2501.09902  [pdf, other

    cs.AR

    Multi-Dimensional Vector ISA Extension for Mobile In-Cache Computing

    Authors: Alireza Khadem, Daichi Fujiki, Hilbert Chen, Yufeng Gu, Nishil Talati, Scott Mahlke, Reetuparna Das

    Abstract: In-cache computing technology transforms existing caches into long-vector compute units and offers low-cost alternatives to building expensive vector engines for mobile CPUs. Unfortunately, existing long-vector Instruction Set Architecture (ISA) extensions, such as RISC-V Vector Extension (RVV) and Arm Scalable Vector Extension (SVE), provide only one-dimensional strided and random memory accesses… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

    Comments: 2025 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

  3. arXiv:2402.14029  [pdf, other

    cs.LG cs.AI stat.ML

    Partially Frozen Random Networks Contain Compact Strong Lottery Tickets

    Authors: Hikari Otsuka, Daiki Chijiwa, Ángel López García-Arias, Yasuyuki Okoshi, Kazushi Kawamura, Thiem Van Chu, Daichi Fujiki, Susumu Takeuchi, Masato Motomura

    Abstract: Randomly initialized dense networks contain subnetworks that achieve high accuracy without weight learning--strong lottery tickets (SLTs). Recently, Gadhikar et al. (2023) demonstrated that SLTs could also be found within a randomly pruned source network. This phenomenon can be exploited to further compress the small memory size required by SLTs. However, their method is limited to SLTs that are e… ▽ More

    Submitted 8 February, 2025; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted at TMLR

  4. arXiv:2312.06086  [pdf, other

    cs.AR

    HALO-CAT: A Hidden Network Processor with Activation-Localized CIM Architecture and Layer-Penetrative Tiling

    Authors: Yung-Chin Chen, Shimpei Ando, Daichi Fujiki, Shinya Takamaeda-Yamazaki, Kentaro Yoshioka

    Abstract: To address the 'memory wall' problem in NN hardware acceleration, we introduce HALO-CAT, a software-hardware co-design optimized for Hidden Neural Network (HNN) processing. HALO-CAT integrates Layer-Penetrative Tiling (LPT) for algorithmic efficiency, reducing intermediate result sizes. Furthermore, the architecture employs an activation-localized computing-in-memory approach to minimize data move… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  5. arXiv:2309.02680  [pdf, other

    cs.AR

    Vector-Processing for Mobile Devices: Benchmark and Analysis

    Authors: Alireza Khadem, Daichi Fujiki, Nishil Talati, Scott Mahlke, Reetuparna Das

    Abstract: Vector processing has become commonplace in today's CPU microarchitectures. Vector instructions improve performance and energy which is crucial for resource-constraint mobile devices. The research community currently lacks a comprehensive benchmark suite to study the benefits of vector processing for mobile devices. This paper presents Swan-an extensive vector processing benchmark suite for mobile… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: 2023 IEEE International Symposium on Workload Characterization (IISWC)

  6. arXiv:2308.15040  [pdf, other

    cs.AR

    OSA-HCIM: On-The-Fly Saliency-Aware Hybrid SRAM CIM with Dynamic Precision Configuration

    Authors: Yung-Chin Chen, Shimpei Ando, Daichi Fujiki, Shinya Takamaeda-Yamazaki, Kentaro Yoshioka

    Abstract: Computing-in-Memory (CIM) has shown great potential for enhancing efficiency and performance for deep neural networks (DNNs). However, the lack of flexibility in CIM leads to an unnecessary expenditure of computational resources on less critical operations, and a diminished Signal-to-Noise Ratio (SNR) when handling more complex tasks, significantly hindering the overall performance. Hence, we focu… ▽ More

    Submitted 21 November, 2023; v1 submitted 29 August, 2023; originally announced August 2023.