-
TriADA: Massively Parallel Trilinear Matrix-by-Tensor Multiply-Add Algorithm and Device Architecture for the Acceleration of 3D Discrete Transformations
Authors:
Stanislav Sedukhin,
Yoichi Tomioka,
Kazuya Matsumoto,
Yuichi Okuyama
Abstract:
Multilinear transformations are key in high-performance computing (HPC) and artificial intelligence (AI) workloads, where data is represented as tensors. However, their high computational and memory demands, which grow with dimensionality, often slow down critical tasks. Moreover, scaling computation by enlarging the number of parallel processing units substantially increases energy consumption, l…
▽ More
Multilinear transformations are key in high-performance computing (HPC) and artificial intelligence (AI) workloads, where data is represented as tensors. However, their high computational and memory demands, which grow with dimensionality, often slow down critical tasks. Moreover, scaling computation by enlarging the number of parallel processing units substantially increases energy consumption, limiting widespread adoption, especially for sparse data, which is common in HPC and AI applications. This paper introduces the Trilinear Algorithm and isomorphic to algorithm Device Architecture (TriADA) to address these challenges with the following innovations: (1) a massively parallel, low-rank algorithm for computing a family of trilinear (3D) discrete orthogonal transformations (3D-DXTs), which is a special case of the more general 3-mode matrix-by-tensor multiplication (3D-GEMT); (2) a new outer-product-based GEMM kernel with decoupled streaming active memory, specially designed to accelerate 3D-GEMT operation; (3) an isomorphic to the proposed algorithm, fully distributed 3D network of mesh interconnected processing elements or cells with a coordinate-free, data-driven local processing activity, which is independent of problem size; (4) an elastic sparse outer-product (ESOP) method that avoids unnecessary computing and communication operations with zero-valued operands, thereby enhancing energy efficiency, computational accuracy, and stability. TriADA is capable of performing a variety of trilinear transformations with hypercubic arithmetic complexity in a linear number of time-steps. The massively parallel, scalable, and energy-efficient architecture of TriADA is ideal for accelerating multilinear tensor operations, which are the most demanding parts of AI and HPC workloads.
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
-
DiffBody: Diffusion-based Pose and Shape Editing of Human Images
Authors:
Yuta Okuyama,
Yuki Endo,
Yoshihiro Kanamori
Abstract:
Pose and body shape editing in a human image has received increasing attention. However, current methods often struggle with dataset biases and deteriorate realism and the person's identity when users make large edits. We propose a one-shot approach that enables large edits with identity preservation. To enable large edits, we fit a 3D body model, project the input image onto the 3D model, and cha…
▽ More
Pose and body shape editing in a human image has received increasing attention. However, current methods often struggle with dataset biases and deteriorate realism and the person's identity when users make large edits. We propose a one-shot approach that enables large edits with identity preservation. To enable large edits, we fit a 3D body model, project the input image onto the 3D model, and change the body's pose and shape. Because this initial textured body model has artifacts due to occlusion and the inaccurate body shape, the rendered image undergoes a diffusion-based refinement, in which strong noise destroys body structure and identity whereas insufficient noise does not help. We thus propose an iterative refinement with weak noise, applied first for the whole body and then for the face. We further enhance the realism by fine-tuning text embeddings via self-supervised learning. Our quantitative and qualitative evaluations demonstrate that our method outperforms other existing methods across various datasets.
△ Less
Submitted 7 January, 2024; v1 submitted 5 January, 2024;
originally announced January 2024.
-
QAmplifyNet: Pushing the Boundaries of Supply Chain Backorder Prediction Using Interpretable Hybrid Quantum-Classical Neural Network
Authors:
Md Abrar Jahin,
Md Sakib Hossain Shovon,
Md. Saiful Islam,
Jungpil Shin,
M. F. Mridha,
Yuichi Okuyama
Abstract:
Supply chain management relies on accurate backorder prediction for optimizing inventory control, reducing costs, and enhancing customer satisfaction. However, traditional machine-learning models struggle with large-scale datasets and complex relationships, hindering real-world data collection. This research introduces a novel methodological framework for supply chain backorder prediction, address…
▽ More
Supply chain management relies on accurate backorder prediction for optimizing inventory control, reducing costs, and enhancing customer satisfaction. However, traditional machine-learning models struggle with large-scale datasets and complex relationships, hindering real-world data collection. This research introduces a novel methodological framework for supply chain backorder prediction, addressing the challenge of handling large datasets. Our proposed model, QAmplifyNet, employs quantum-inspired techniques within a quantum-classical neural network to predict backorders effectively on short and imbalanced datasets. Experimental evaluations on a benchmark dataset demonstrate QAmplifyNet's superiority over classical models, quantum ensembles, quantum neural networks, and deep reinforcement learning. Its proficiency in handling short, imbalanced datasets makes it an ideal solution for supply chain management. To enhance model interpretability, we use Explainable Artificial Intelligence techniques. Practical implications include improved inventory control, reduced backorders, and enhanced operational efficiency. QAmplifyNet seamlessly integrates into real-world supply chain management systems, enabling proactive decision-making and efficient resource allocation. Future work involves exploring additional quantum-inspired techniques, expanding the dataset, and investigating other supply chain applications. This research unlocks the potential of quantum computing in supply chain optimization and paves the way for further exploration of quantum-inspired machine learning models in supply chain management. Our framework and QAmplifyNet model offer a breakthrough approach to supply chain backorder prediction, providing superior performance and opening new avenues for leveraging quantum-inspired techniques in supply chain management.
△ Less
Submitted 15 October, 2023; v1 submitted 24 July, 2023;
originally announced July 2023.
-
A low-overhead soft-hard fault-tolerant architecture, design and management scheme for reliable high-performance many-core 3D-NoC systems
Authors:
Khanh N Dang,
Michael Meyer,
Yuichi Okuyama,
Abderazek Ben Abdallah
Abstract:
The Network-on-Chip (NoC) paradigm has been proposed as a favorable solution to handle the strict communication requirements between the increasingly large number of cores on a single chip. However, NoC systems are exposed to the aggressive scaling down of transistors, low operating voltages, and high integration and power densities, making them vulnerable to permanent (hard) faults and transient…
▽ More
The Network-on-Chip (NoC) paradigm has been proposed as a favorable solution to handle the strict communication requirements between the increasingly large number of cores on a single chip. However, NoC systems are exposed to the aggressive scaling down of transistors, low operating voltages, and high integration and power densities, making them vulnerable to permanent (hard) faults and transient (soft) errors. A hard fault in a NoC can lead to external blocking, causing congestion across the whole network. A soft error is more challenging because of its silent data corruption, which leads to a large area of erroneous data due to error propagation, packet re-transmission, and deadlock. In this paper, we present the architecture and design of a comprehensive soft error and hard fault-tolerant 3D-NoC system, named 3D-Hard-Fault-Soft-Error-Tolerant-OASIS-NoC (3D-FETO). With the aid of efficient mechanisms and algorithms, 3D-FETO is capable of detecting and recovering from soft errors which occur in the routing pipeline stages and leverages reconfigurable components to handle permanent faults in links, input buffers, and crossbars. In-depth evaluation results show that the 3D-FETO system is able to work around different kinds of hard faults and soft errors, ensuring graceful performance degradation, while minimizing additional hardware complexity and remaining power efficient.
△ Less
Submitted 21 March, 2020;
originally announced March 2020.
-
Reliability Assessment and Quantitative Evaluation of Soft-Error Resilient 3D Network-on-Chip Systems
Authors:
Khanh N Dang,
Michael Meyer,
Yuichi Okuyama,
Abderazek Ben Abdallah
Abstract:
Three-Dimensional Networks-on-Chips (3D-NoCs) have been proposed as an auspicious solution, merging the high parallelism of the Network-on-Chip (NoC) paradigm with the high-performance and low-power cost of 3D-ICs. However, as technology scales down, the reliability issues are becoming more crucial, especially for complex 3D-NoC which provides the communication requirements of multi and many-core…
▽ More
Three-Dimensional Networks-on-Chips (3D-NoCs) have been proposed as an auspicious solution, merging the high parallelism of the Network-on-Chip (NoC) paradigm with the high-performance and low-power cost of 3D-ICs. However, as technology scales down, the reliability issues are becoming more crucial, especially for complex 3D-NoC which provides the communication requirements of multi and many-core systems-on-chip. Reliability assessment is prominent for early stages of the manufacturing process to prevent costly redesigns of a target system. In this paper, we present an accurate reliability assessment and quantitative evaluation of a soft-error resilient 3D-NoC based on a soft-error resilient mechanism. The system can recover from transient errors occurring in different pipeline stages of the router. Based on this analysis, the effects of failures in the network's principal components are determined.
△ Less
Submitted 21 March, 2020;
originally announced March 2020.
-
Soft-Error and Hard-fault Tolerant Architecture and Routing Algorithm for Reliable 3D-NoC Systems
Authors:
Khanh N. Dang,
Yuichi Okuyama,
Abderazek Ben Abdallah
Abstract:
Network-on-Chip (NoC) paradigm has been proposed as an auspicious solution to handle the strict communication requirements between the increasingly large number of cores on a single multi and many-core chips. However, NoC systems are exposed to a variety of manufacturing, design and energetic particles factors making them vulnerable to permanent (hard) faults and transient (soft) errors. In this p…
▽ More
Network-on-Chip (NoC) paradigm has been proposed as an auspicious solution to handle the strict communication requirements between the increasingly large number of cores on a single multi and many-core chips. However, NoC systems are exposed to a variety of manufacturing, design and energetic particles factors making them vulnerable to permanent (hard) faults and transient (soft) errors. In this paper, we present a comprehensive soft error and hard fault tolerant 3D-NoC architecture, named 3D-Hard-Fault-Soft-Error-Tolerant-OASIS-NoC (3D-FETO). With the aid of adaptive algorithms, 3D-FETO is capable of detecting and recovering from soft errors occurring in the routing pipeline stages and is leveraging on reconfigurable components to handle permanent faults occurrence in links, input buffers, and crossbar. In-depth evaluation results show that the 3D-FETO system is able to work around different kinds of hard faults and soft errors while ensuring graceful performance degradation, minimizing the additional hardware complexity and remaining power-efficient.
△ Less
Submitted 21 March, 2020;
originally announced March 2020.