-
Magistral
Authors:
Mistral-AI,
:,
Abhinav Rastogi,
Albert Q. Jiang,
Andy Lo,
Gabrielle Berrada,
Guillaume Lample,
Jason Rute,
Joep Barmentlo,
Karmesh Yadav,
Kartik Khandelwal,
Khyathi Raghavi Chandu,
Léonard Blier,
Lucile Saulnier,
Matthieu Dinot,
Maxime Darrin,
Neha Gupta,
Roman Soletskyi,
Sagar Vaze,
Teven Le Scao,
Yihan Wang,
Adam Yang,
Alexander H. Liu,
Alexandre Sablayrolles,
Amélie Héliou
, et al. (76 additional authors not shown)
Abstract:
We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior models, we follow a ground up approach, relying solely on our own models and infrastructure. Notably, we demonstrate a stack that enabled us to explore the limits of pure RL training of LLMs, present a s…
▽ More
We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior models, we follow a ground up approach, relying solely on our own models and infrastructure. Notably, we demonstrate a stack that enabled us to explore the limits of pure RL training of LLMs, present a simple method to force the reasoning language of the model, and show that RL on text data alone maintains most of the initial checkpoint's capabilities. We find that RL on text maintains or improves multimodal understanding, instruction following and function calling. We present Magistral Medium, trained for reasoning on top of Mistral Medium 3 with RL alone, and we open-source Magistral Small (Apache 2.0) which further includes cold-start data from Magistral Medium.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Pixtral 12B
Authors:
Pravesh Agrawal,
Szymon Antoniak,
Emma Bou Hanna,
Baptiste Bout,
Devendra Chaplot,
Jessica Chudnovsky,
Diogo Costa,
Baudouin De Monicault,
Saurabh Garg,
Theophile Gervet,
Soham Ghosh,
Amélie Héliou,
Paul Jacob,
Albert Q. Jiang,
Kartik Khandelwal,
Timothée Lacroix,
Guillaume Lample,
Diego Las Casas,
Thibaut Lavril,
Teven Le Scao,
Andy Lo,
William Marshall,
Louis Martin,
Arthur Mensch,
Pavankumar Muddireddy
, et al. (17 additional authors not shown)
Abstract:
We introduce Pixtral-12B, a 12--billion-parameter multimodal language model. Pixtral-12B is trained to understand both natural images and documents, achieving leading performance on various multimodal benchmarks, surpassing a number of larger models. Unlike many open-source models, Pixtral is also a cutting-edge text model for its size, and does not compromise on natural language performance to ex…
▽ More
We introduce Pixtral-12B, a 12--billion-parameter multimodal language model. Pixtral-12B is trained to understand both natural images and documents, achieving leading performance on various multimodal benchmarks, surpassing a number of larger models. Unlike many open-source models, Pixtral is also a cutting-edge text model for its size, and does not compromise on natural language performance to excel in multimodal tasks. Pixtral uses a new vision encoder trained from scratch, which allows it to ingest images at their natural resolution and aspect ratio. This gives users flexibility on the number of tokens used to process an image. Pixtral is also able to process any number of images in its long context window of 128K tokens. Pixtral 12B substanially outperforms other open models of similar sizes (Llama-3.2 11B \& Qwen-2-VL 7B). It also outperforms much larger open models like Llama-3.2 90B while being 7x smaller. We further contribute an open-source benchmark, MM-MT-Bench, for evaluating vision-language models in practical scenarios, and provide detailed analysis and code for standardized evaluation protocols for multimodal LLMs. Pixtral-12B is released under Apache 2.0 license.
△ Less
Submitted 10 October, 2024; v1 submitted 9 October, 2024;
originally announced October 2024.
-
Overcoming Conflicting Data when Updating a Neural Semantic Parser
Authors:
David Gaddy,
Alex Kouzemtchenko,
Pavankumar Reddy Muddireddy,
Prateek Kolhar,
Rushin Shah
Abstract:
In this paper, we explore how to use a small amount of new data to update a task-oriented semantic parsing model when the desired output for some examples has changed. When making updates in this way, one potential problem that arises is the presence of conflicting data, or out-of-date labels in the original training set. To evaluate the impact of this understudied problem, we propose an experimen…
▽ More
In this paper, we explore how to use a small amount of new data to update a task-oriented semantic parsing model when the desired output for some examples has changed. When making updates in this way, one potential problem that arises is the presence of conflicting data, or out-of-date labels in the original training set. To evaluate the impact of this understudied problem, we propose an experimental setup for simulating changes to a neural semantic parser. We show that the presence of conflicting data greatly hinders learning of an update, then explore several methods to mitigate its effect. Our multi-task and data selection methods lead to large improvements in model accuracy compared to a naive data-mixing strategy, and our best method closes 86% of the accuracy gap between this baseline and an oracle upper bound.
△ Less
Submitted 9 December, 2021; v1 submitted 23 October, 2020;
originally announced October 2020.