MotherNet: Fast Training and Inference via Hyper-Network Transformers

Müller, Andreas; Curino, Carlo; Ramakrishnan, Raghu

Computer Science > Machine Learning

arXiv:2312.08598 (cs)

[Submitted on 14 Dec 2023 (v1), last revised 9 May 2025 (this version, v2)]

Title:MotherNet: Fast Training and Inference via Hyper-Network Transformers

Authors:Andreas Müller, Carlo Curino, Raghu Ramakrishnan

View PDF HTML (experimental)

Abstract:Foundation models are transforming machine learning across many modalities, with in-context learning replacing classical model training. Recent work on tabular data hints at a similar opportunity to build foundation models for classification for numerical data. However, existing meta-learning approaches can not compete with tree-based methods in terms of inference time. In this paper, we propose MotherNet, a hypernetwork architecture trained on synthetic classification tasks that, once prompted with a never-seen-before training set generates the weights of a trained ``child'' neural-network by in-context learning using a single forward pass. In contrast to most existing hypernetworks that are usually trained for relatively constrained multi-task settings, MotherNet can create models for multiclass classification on arbitrary tabular datasets without any dataset specific gradient descent. The child network generated by MotherNet outperforms neural networks trained using gradient descent on small datasets, and is comparable to predictions by TabPFN and standard ML methods like Gradient Boosting. Unlike a direct application of TabPFN, MotherNet generated networks are highly efficient at inference time. We also demonstrate that HyperFast is unable to perform effective in-context learning on small datasets, and heavily relies on dataset specific fine-tuning and hyper-parameter tuning, while MotherNet requires no fine-tuning or per-dataset hyper-parameters.

Comments:	17 pages, 13 figures
Subjects:	Machine Learning (cs.LG)
ACM classes:	I.2.6
Cite as:	arXiv:2312.08598 [cs.LG]
	(or arXiv:2312.08598v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2312.08598

Submission history

From: Andreas Müller [view email]
[v1] Thu, 14 Dec 2023 01:48:58 UTC (878 KB)
[v2] Fri, 9 May 2025 16:02:18 UTC (586 KB)

Computer Science > Machine Learning

Title:MotherNet: Fast Training and Inference via Hyper-Network Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:MotherNet: Fast Training and Inference via Hyper-Network Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators