On Least Square Estimation in Softmax Gating Mixture of Experts

Nguyen, Huy; Ho, Nhat; Rinaldo, Alessandro

Statistics > Machine Learning

arXiv:2402.02952 (stat)

[Submitted on 5 Feb 2024 (v1), last revised 24 Jun 2024 (this version, v2)]

Title:On Least Square Estimation in Softmax Gating Mixture of Experts

Authors:Huy Nguyen, Nhat Ho, Alessandro Rinaldo

View PDF HTML (experimental)

Abstract:Mixture of experts (MoE) model is a statistical machine learning design that aggregates multiple expert networks using a softmax gating function in order to form a more intricate and expressive model. Despite being commonly used in several applications owing to their scalability, the mathematical and statistical properties of MoE models are complex and difficult to analyze. As a result, previous theoretical works have primarily focused on probabilistic MoE models by imposing the impractical assumption that the data are generated from a Gaussian MoE model. In this work, we investigate the performance of the least squares estimators (LSE) under a deterministic MoE model where the data are sampled according to a regression model, a setting that has remained largely unexplored. We establish a condition called strong identifiability to characterize the convergence behavior of various types of expert functions. We demonstrate that the rates for estimating strongly identifiable experts, namely the widely used feed-forward networks with activation functions $\mathrm{sigmoid}(\cdot)$ and $\tanh(\cdot)$, are substantially faster than those of polynomial experts, which we show to exhibit a surprising slow estimation rate. Our findings have important practical implications for expert selection.

Comments:	Accepted to ICML 2024, 29 pages, 2 figures, 2 tables
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2402.02952 [stat.ML]
	(or arXiv:2402.02952v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2402.02952

Submission history

From: Huy Nguyen [view email]
[v1] Mon, 5 Feb 2024 12:31:18 UTC (58 KB)
[v2] Mon, 24 Jun 2024 04:32:55 UTC (1,074 KB)

Statistics > Machine Learning

Title:On Least Square Estimation in Softmax Gating Mixture of Experts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:On Least Square Estimation in Softmax Gating Mixture of Experts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators