Ironman: Accelerating Oblivious Transfer Extension for Privacy-Preserving AI with Near-Memory Processing

Lin, Chenqi; Yang, Kang; Xu, Tianshi; Liang, Ling; Wang, Yufei; Chen, Zhaohui; Wang, Runsheng; Gao, Mingyu; Li, Meng

Computer Science > Hardware Architecture

arXiv:2507.16391 (cs)

[Submitted on 22 Jul 2025 (v1), last revised 23 Jul 2025 (this version, v2)]

Title:Ironman: Accelerating Oblivious Transfer Extension for Privacy-Preserving AI with Near-Memory Processing

Authors:Chenqi Lin, Kang Yang, Tianshi Xu, Ling Liang, Yufei Wang, Zhaohui Chen, Runsheng Wang, Mingyu Gao, Meng Li

View PDF HTML (experimental)

Abstract:With the wide application of machine learning (ML), privacy concerns arise with user data as they may contain sensitive information. Privacy-preserving ML (PPML) based on cryptographic primitives has emerged as a promising solution in which an ML model is directly computed on the encrypted data to provide a formal privacy guarantee. However, PPML frameworks heavily rely on the oblivious transfer (OT) primitive to compute nonlinear functions. OT mainly involves the computation of single-point correlated OT (SPCOT) and learning parity with noise (LPN) operations. As OT is still computed extensively on general-purpose CPUs, it becomes the latency bottleneck of modern PPML frameworks.
In this paper, we propose a novel OT accelerator, dubbed Ironman, to significantly increase the efficiency of OT and the overall PPML framework. We observe that SPCOT is computation-bounded, and thus propose a hardware-friendly SPCOT algorithm with a customized accelerator to improve SPCOT computation throughput. In contrast, LPN is memory-bandwidth-bounded due to irregular memory access patterns. Hence, we further leverage the near-memory processing (NMP) architecture equipped with memory-side cache and index sorting to improve effective memory bandwidth. With extensive experiments, we demonstrate Ironman achieves a 39.2-237.4 times improvement in OT throughput across different NMP configurations compared to the full-thread CPU implementation. For different PPML frameworks, Ironman demonstrates a 2.1-3.4 times reduction in end-to-end latency for both CNN and Transformer models.

Subjects:	Hardware Architecture (cs.AR)
Cite as:	arXiv:2507.16391 [cs.AR]
	(or arXiv:2507.16391v2 [cs.AR] for this version)
	https://doi.org/10.48550/arXiv.2507.16391

Submission history

From: Chenqi Lin [view email]
[v1] Tue, 22 Jul 2025 09:35:59 UTC (7,525 KB)
[v2] Wed, 23 Jul 2025 09:31:01 UTC (7,525 KB)

Computer Science > Hardware Architecture

Title:Ironman: Accelerating Oblivious Transfer Extension for Privacy-Preserving AI with Near-Memory Processing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Hardware Architecture

Title:Ironman: Accelerating Oblivious Transfer Extension for Privacy-Preserving AI with Near-Memory Processing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators