Skip to main content

Showing 1–14 of 14 results for author: Takamaeda-Yamazaki, S

.
  1. arXiv:2505.23246  [pdf, ps, other

    cs.LG

    Measuring Participant Contributions in Decentralized Federated Learning

    Authors: Honoka Anada, Tatsuya Kaneko, Shinya Takamaeda-Yamazaki

    Abstract: Federated learning (FL) enables multiple clients to collaboratively train models without sharing their data. Measuring participant contributions in FL is crucial for incentivizing clients and ensuring transparency. While various methods have been proposed for contribution measurement, they are designed exclusively for centralized federated learning (CFL), where a central server collects and aggreg… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  2. arXiv:2505.05266  [pdf, other

    cs.AR cs.DC

    PUDTune: Multi-Level Charging for High-Precision Calibration in Processing-Using-DRAM

    Authors: Tatsuya Kubo, Daichi Tokuda, Lei Qu, Ting Cao, Shinya Takamaeda-Yamazaki

    Abstract: Recently, practical analog in-memory computing has been realized using unmodified commercial DRAM modules. The underlying Processing-Using-DRAM (PUD) techniques enable high-throughput bitwise operations directly within DRAM arrays. However, the presence of inherent error-prone columns hinders PUD's practical adoption. While selectively using only error-free columns would ensure reliability, this a… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  3. arXiv:2503.23817  [pdf, other

    cs.AR cs.DC

    MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration

    Authors: Tatsuya Kubo, Daichi Tokuda, Tomoya Nagatani, Masayuki Usui, Lei Qu, Ting Cao, Shinya Takamaeda-Yamazaki

    Abstract: General matrix-vector multiplication (GeMV) remains a critical latency bottleneck in large language model (LLM) inference, even with quantized low-bit models. Processing-Using-DRAM (PUD), an analog in-DRAM computing technique, has the potential to repurpose on-device DRAM as a GeMV engine, offering additional high-throughput processing capabilities to widespread consumer devices without DRAM modif… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

  4. PRIOT: Pruning-Based Integer-Only Transfer Learning for Embedded Systems

    Authors: Honoka Anada, Sefutsu Ryu, Masayuki Usui, Tatsuya Kaneko, Shinya Takamaeda-Yamazaki

    Abstract: On-device transfer learning is crucial for adapting a common backbone model to the unique environment of each edge device. Tiny microcontrollers, such as the Raspberry Pi Pico, are key targets for on-device learning but often lack floating-point units, necessitating integer-only training. Dynamic computation of quantization scale factors, which is adopted in former studies, incurs high computation… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: Accepted for publication in IEEE Embedded Systems Letters

  5. arXiv:2502.11848  [pdf, other

    cs.PL

    RustSFQ: A Domain-Specific Language for SFQ Circuit Design

    Authors: Mebuki Oishi, Sun Tanaka, Shinya Takamaeda-Yamazaki

    Abstract: Cell-based design of a single-flux-quantum (SFQ) digital circuit requires input-output consistency; every output signal must be consumed only once by the input of the following component, which is a unique constraint, unlike the traditional CMOS digital circuit design. While there are some cell libraries and simulation tools for SFQ circuit development, they do not verify the input-output consiste… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  6. arXiv:2502.11782  [pdf, other

    cs.AR

    Exploring the Versal AI Engine for 3D Gaussian Splatting

    Authors: Kotaro Shimamura, Ayumi Ohno, Shinya Takamaeda-Yamazaki

    Abstract: Dataflow-oriented spatial architectures are the emerging paradigm for higher computation performance and efficiency. AMD Versal AI Engine is a commercial spatial architecture consisting of tiles of VLIW processors supporting SIMD operations arranged in a two-dimensional mesh. The architecture requires the explicit design of task assignments and dataflow configurations for each tile to maximize… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  7. arXiv:2502.11660  [pdf, other

    cs.AR

    Accelerating Elliptic Curve Point Additions on Versal AI Engine for Multi-scalar Multiplication

    Authors: Ayumi Ohno, Kotaro Shimamura, Shinya Takamaeda-Yamazaki

    Abstract: Multi-scalar multiplication (MSM) is crucial in cryptographic applications and computationally intensive in zero-knowledge proofs. MSM involves accumulating the products of scalars and points on an elliptic curve over a 377-bit modulus, and the Pippenger algorithm converts MSM into a series of elliptic curve point additions (PADDs) with high parallelism. This study investigates accelerating MSM on… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  8. arXiv:2411.01161  [pdf, other

    stat.ML cs.CR cs.LG

    Federated Learning with Relative Fairness

    Authors: Shogo Nakakita, Tatsuya Kaneko, Shinya Takamaeda-Yamazaki, Masaaki Imaizumi

    Abstract: This paper proposes a federated learning framework designed to achieve \textit{relative fairness} for clients. Traditional federated learning frameworks typically ensure absolute fairness by guaranteeing minimum performance across all client subgroups. However, this approach overlooks disparities in model performance between subgroups. The proposed framework uses a minimax problem approach to mini… ▽ More

    Submitted 2 November, 2024; originally announced November 2024.

    Comments: 43 pages

  9. PACiM: A Sparsity-Centric Hybrid Compute-in-Memory Architecture via Probabilistic Approximation

    Authors: Wenlun Zhang, Shimpei Ando, Yung-Chin Chen, Satomi Miyagi, Shinya Takamaeda-Yamazaki, Kentaro Yoshioka

    Abstract: Approximate computing emerges as a promising approach to enhance the efficiency of compute-in-memory (CiM) systems in deep neural network processing. However, traditional approximate techniques often significantly trade off accuracy for power efficiency, and fail to reduce data transfer between main memory and CiM banks, which dominates power consumption. This paper introduces a novel probabilisti… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Journal ref: IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2024)

  10. arXiv:2312.06086  [pdf, other

    cs.AR

    HALO-CAT: A Hidden Network Processor with Activation-Localized CIM Architecture and Layer-Penetrative Tiling

    Authors: Yung-Chin Chen, Shimpei Ando, Daichi Fujiki, Shinya Takamaeda-Yamazaki, Kentaro Yoshioka

    Abstract: To address the 'memory wall' problem in NN hardware acceleration, we introduce HALO-CAT, a software-hardware co-design optimized for Hidden Neural Network (HNN) processing. HALO-CAT integrates Layer-Penetrative Tiling (LPT) for algorithmic efficiency, reducing intermediate result sizes. Furthermore, the architecture employs an activation-localized computing-in-memory approach to minimize data move… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  11. arXiv:2308.15040  [pdf, other

    cs.AR

    OSA-HCIM: On-The-Fly Saliency-Aware Hybrid SRAM CIM with Dynamic Precision Configuration

    Authors: Yung-Chin Chen, Shimpei Ando, Daichi Fujiki, Shinya Takamaeda-Yamazaki, Kentaro Yoshioka

    Abstract: Computing-in-Memory (CIM) has shown great potential for enhancing efficiency and performance for deep neural networks (DNNs). However, the lack of flexibility in CIM leads to an unnecessary expenditure of computational resources on less critical operations, and a diminished Signal-to-Noise Ratio (SNR) when handling more complex tasks, significantly hindering the overall performance. Hence, we focu… ▽ More

    Submitted 21 November, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

  12. FADEC: FPGA-based Acceleration of Video Depth Estimation by HW/SW Co-design

    Authors: Nobuho Hashimoto, Shinya Takamaeda-Yamazaki

    Abstract: 3D reconstruction from videos has become increasingly popular for various applications, including navigation for autonomous driving of robots and drones, augmented reality (AR), and 3D modeling. This task often combines traditional image/video processing algorithms and deep neural networks (DNNs). Although recent developments in deep learning have improved the accuracy of the task, the large numbe… ▽ More

    Submitted 16 December, 2022; v1 submitted 1 December, 2022; originally announced December 2022.

    Comments: 9 pages, 8 figures, 3 tables, FPT 2022 (Full paper), Program: https://fpt22.hkust.edu.hk/program#tools, GitHub: https://github.com/casys-utokyo/fadec, Slides: https://speakerdeck.com/hashi0203/sw-co-design-fpt-2022-8082a83d-3167-461c-8560-60f77959a3d5, Movie: https://youtu.be/NFULXQeu6Vw, Profile: https://n-hassy.info

  13. arXiv:2211.13402  [pdf, other

    cs.LG

    MP-GELU Bayesian Neural Networks: Moment Propagation by GELU Nonlinearity

    Authors: Yuki Hirayama, Sinya Takamaeda-Yamazaki

    Abstract: Bayesian neural networks (BNNs) have been an important framework in the study of uncertainty quantification. Deterministic variational inference, one of the inference methods, utilizes moment propagation to compute the predictive distributions and objective functions. Unfortunately, deriving the moments requires computationally expensive Taylor expansion in nonlinear functions, such as a rectified… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: 9 pages, 1 figures

  14. An FPGA-Based Fully Pipelined Bilateral Grid for Real-Time Image Denoising

    Authors: Nobuho Hashimoto, Shinya Takamaeda-Yamazaki

    Abstract: The bilateral filter (BF) is widely used in image processing because it can perform denoising while preserving edges. It has disadvantages in that it is nonlinear, and its computational complexity and hardware resources are directly proportional to its window size. Thus far, several approximation methods and hardware implementations have been proposed to solve these problems. However, processing l… ▽ More

    Submitted 13 December, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: 7 pages, 12 figures, 2 tables, FPL 2021 (Full paper), Program and Abstract: https://whova.com/embedded/session/kScht29Q8lAG98UvGiEh7UVYTNeJssTMevISW%407S-oU%3D/1837318/, Slides: https://speakerdeck.com/hashi0203/an-fpga-based-fully-pipelined-bilateral-grid-for-real-time-image-denoising-fpl-2021, Movie: https://youtu.be/q5lxi7N-uX8, Profile: https://n-hassy.info