We gratefully acknowledge support from
the Simons Foundation and member institutions.

Mehdi Ali and Pedro Ortiz Suarez are qualified to endorse.

Tokenizer Choice For LLM Training: Negligible or Crucial?

Mehdi Ali: Is registered as an author of this paper.
Can endorse for cs.AI, cs.LG. (why?)
Jan Ebert: Is registered as an author of this paper.
Not currently an endorser. (why?)
Pedro Ortiz Suarez: Is registered as an author of this paper.
Can endorse for cs.CL. (why?)

Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Niclas Doll, Jasper Schulze Buschhoff, Charvi Jain, Alexander Arno Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Malte Ostendorff, Samuel Weinbach, Rafet Sifa, Stefan Kesselheim and Nicolas Flores-Herr are not registered as owners of this paper. (why?)