Skip to main content

Showing 1–1 of 1 results for author: Reiss, D A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.14794  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Assembly of Experts: Linear-time construction of the Chimera LLM variants with emergent and adaptable behaviors

    Authors: Henrik Klagges, Robert Dahlke, Fabian Klemm, Benjamin Merkel, Daniel Klingmann, David A. Reiss, Dan Zecha

    Abstract: Requiring $10^{13}$-$10^{15}$ FLOPs to calculate one 8 bit weight in an LLM during pretraining is extremely expensive and seems inefficient. To better leverage the huge investments made into pretrained models, we develop the new "Assembly-of-Experts" (AoE) construction method to create capable child variants of existing Mixture-of-Experts parent models in linear time. Model weight tensors get inte… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.