Skip to main content

Showing 1–5 of 5 results for author: Amani, M H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.14233  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Apertus: Democratizing Open and Compliant LLMs for Global Language Environments

    Authors: Alejandro Hernández-Cano, Alexander Hägele, Allen Hao Huang, Angelika Romanou, Antoni-Joan Solergibert, Barna Pasztor, Bettina Messmer, Dhia Garbaya, Eduard Frank Ďurech, Ido Hakimi, Juan García Giraldo, Mete Ismayilzada, Negar Foroutan, Skander Moalla, Tiancheng Chen, Vinko Sabolčec, Yixuan Xu, Michael Aerni, Badr AlKhamissi, Ines Altemir Marinas, Mohammad Hossein Amani, Matin Ansaripour, Ilia Badanin, Harold Benoit, Emanuela Boros , et al. (76 additional authors not shown)

    Abstract: We present Apertus, a fully open suite of large language models (LLMs) designed to address two systemic shortcomings in today's open model ecosystem: data compliance and multilingual representation. Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively r… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

  2. arXiv:2506.18110  [pdf, ps, other

    cs.LG cs.AI

    RL for Reasoning by Adaptively Revealing Rationales

    Authors: Mohammad Hossein Amani, Aryo Lotfi, Nicolas Mario Baldwin, Samy Bengio, Mehrdad Farajtabar, Emmanuel Abbe, Robert West

    Abstract: We propose that reinforcement learning (RL) from partial expert demonstrations is not merely a training heuristic, but a promising framework for solving complex sequence generation tasks. Supervised fine-tuning (SFT) relies on dense ground-truth labels, which become increasingly costly as sequence length grows. RL, on the other hand, struggles with sparse rewards and a combinatorially large output… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 18 pages, 8 figures

  3. arXiv:2402.10575  [pdf, other

    cs.LG cs.AI

    Symbolic Autoencoding for Self-Supervised Sequence Learning

    Authors: Mohammad Hossein Amani, Nicolas Mario Baldwin, Amin Mansouri, Martin Josifoski, Maxime Peyrard, Robert West

    Abstract: Traditional language models, adept at next-token prediction in text sequences, often struggle with transduction tasks between distinct symbolic systems, particularly when parallel data is scarce. Addressing this issue, we introduce \textit{symbolic autoencoding} ($Σ$AE), a self-supervised framework that harnesses the power of abundant unparallel data alongside limited parallel data. $Σ$AE connects… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  4. arXiv:2205.10217  [pdf, other

    stat.ML cs.IT cs.LG

    Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization

    Authors: Simone Bombari, Mohammad Hossein Amani, Marco Mondelli

    Abstract: The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. A line of work has studied the NTK spectrum for two-layer and deep networks with at least a layer with $Ω(N)$ neurons, $N$ being the number of training samples. Furthermore, there is increasing evidence suggesting that deep networks with sub-li… ▽ More

    Submitted 21 May, 2023; v1 submitted 20 May, 2022; originally announced May 2022.

    Comments: Uniformed with the published NeurIPS 2022 version

  5. arXiv:2205.08199  [pdf, ps, other

    cs.IT cs.LG stat.ML

    Sharp asymptotics on the compression of two-layer neural networks

    Authors: Mohammad Hossein Amani, Simone Bombari, Marco Mondelli, Rattana Pukdee, Stefano Rini

    Abstract: In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M<N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. sub-Gaussian, and we minimize the population L_2 loss between the outputs of the target and of the compressed network, under the assumption of Gaussian inputs. By using tools… ▽ More

    Submitted 16 August, 2022; v1 submitted 17 May, 2022; originally announced May 2022.