-
Magistral
Authors:
Mistral-AI,
:,
Abhinav Rastogi,
Albert Q. Jiang,
Andy Lo,
Gabrielle Berrada,
Guillaume Lample,
Jason Rute,
Joep Barmentlo,
Karmesh Yadav,
Kartik Khandelwal,
Khyathi Raghavi Chandu,
Léonard Blier,
Lucile Saulnier,
Matthieu Dinot,
Maxime Darrin,
Neha Gupta,
Roman Soletskyi,
Sagar Vaze,
Teven Le Scao,
Yihan Wang,
Adam Yang,
Alexander H. Liu,
Alexandre Sablayrolles,
Amélie Héliou
, et al. (76 additional authors not shown)
Abstract:
We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior models, we follow a ground up approach, relying solely on our own models and infrastructure. Notably, we demonstrate a stack that enabled us to explore the limits of pure RL training of LLMs, present a s…
▽ More
We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior models, we follow a ground up approach, relying solely on our own models and infrastructure. Notably, we demonstrate a stack that enabled us to explore the limits of pure RL training of LLMs, present a simple method to force the reasoning language of the model, and show that RL on text data alone maintains most of the initial checkpoint's capabilities. We find that RL on text maintains or improves multimodal understanding, instruction following and function calling. We present Magistral Medium, trained for reasoning on top of Mistral Medium 3 with RL alone, and we open-source Magistral Small (Apache 2.0) which further includes cold-start data from Magistral Medium.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
AI Agents for Conversational Patient Triage: Preliminary Simulation-Based Evaluation with Real-World EHR Data
Authors:
Sina Rashidian,
Nan Li,
Jonathan Amar,
Jong Ha Lee,
Sam Pugh,
Eric Yang,
Geoff Masterson,
Myoung Cha,
Yugang Jia,
Akhil Vaid
Abstract:
Background: We present a Patient Simulator that leverages real world patient encounters which cover a broad range of conditions and symptoms to provide synthetic test subjects for development and testing of healthcare agentic models. The simulator provides a realistic approach to patient presentation and multi-turn conversation with a symptom-checking agent. Objectives: (1) To construct and instan…
▽ More
Background: We present a Patient Simulator that leverages real world patient encounters which cover a broad range of conditions and symptoms to provide synthetic test subjects for development and testing of healthcare agentic models. The simulator provides a realistic approach to patient presentation and multi-turn conversation with a symptom-checking agent. Objectives: (1) To construct and instantiate a Patient Simulator to train and test an AI health agent, based on patient vignettes derived from real EHR data. (2) To test the validity and alignment of the simulated encounters provided by the Patient Simulator to expert human clinical providers. (3) To illustrate the evaluation framework of such an LLM system on the generated realistic, data-driven simulations -- yielding a preliminary assessment of our proposed system. Methods: We first constructed realistic clinical scenarios by deriving patient vignettes from real-world EHR encounters. These vignettes cover a variety of presenting symptoms and underlying conditions. We then evaluate the performance of the Patient Simulator as a simulacrum of a real patient encounter across over 500 different patient vignettes. We leveraged a separate AI agent to provide multi-turn questions to obtain a history of present illness. The resulting multiturn conversations were evaluated by two expert clinicians. Results: Clinicians scored the Patient Simulator as consistent with the patient vignettes in those same 97.7% of cases. The extracted case summary based on the conversation history was 99% relevant. Conclusions: We developed a methodology to incorporate vignettes derived from real healthcare patient data to build a simulation of patient responses to symptom checking agents. The performance and alignment of this Patient Simulator could be used to train and test a multi-turn conversational AI agent at scale.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Sleepless Nights, Sugary Days: Creating Synthetic Users with Health Conditions for Realistic Coaching Agent Interactions
Authors:
Taedong Yun,
Eric Yang,
Mustafa Safdari,
Jong Ha Lee,
Vaishnavi Vinod Kumar,
S. Sara Mahdavi,
Jonathan Amar,
Derek Peyton,
Reut Aharony,
Andreas Michaelides,
Logan Schneider,
Isaac Galatzer-Levy,
Yugang Jia,
John Canny,
Arthur Gretton,
Maja Matarić
Abstract:
We present an end-to-end framework for generating synthetic users for evaluating interactive agents designed to encourage positive behavior changes, such as in health and lifestyle coaching. The synthetic users are grounded in health and lifestyle conditions, specifically sleep and diabetes management in this study, to ensure realistic interactions with the health coaching agent. Synthetic users a…
▽ More
We present an end-to-end framework for generating synthetic users for evaluating interactive agents designed to encourage positive behavior changes, such as in health and lifestyle coaching. The synthetic users are grounded in health and lifestyle conditions, specifically sleep and diabetes management in this study, to ensure realistic interactions with the health coaching agent. Synthetic users are created in two stages: first, structured data are generated grounded in real-world health and lifestyle factors in addition to basic demographics and behavioral attributes; second, full profiles of the synthetic users are developed conditioned on the structured data. Interactions between synthetic users and the coaching agent are simulated using generative agent-based models such as Concordia, or directly by prompting a language model. Using two independently-developed agents for sleep and diabetes coaching as case studies, the validity of this framework is demonstrated by analyzing the coaching agent's understanding of the synthetic users' needs and challenges. Finally, through multiple blinded evaluations of user-coach interactions by human experts, we demonstrate that our synthetic users with health and behavioral attributes more accurately portray real human users with the same attributes, compared to generic synthetic users not grounded in such attributes. The proposed framework lays the foundation for efficient development of conversational agents through extensive, realistic, and grounded simulated interactions.
△ Less
Submitted 4 June, 2025; v1 submitted 18 February, 2025;
originally announced February 2025.
-
From Barriers to Tactics: A Behavioral Science-Informed Agentic Workflow for Personalized Nutrition Coaching
Authors:
Eric Yang,
Tomas Garcia,
Hannah Williams,
Bhawesh Kumar,
Martin Ramé,
Eileen Rivera,
Yiran Ma,
Jonathan Amar,
Caricia Catalani,
Yugang Jia
Abstract:
Effective management of cardiometabolic conditions requires sustained positive nutrition habits, often hindered by complex and individualized barriers. Direct human management is simply not scalable, while previous attempts aimed at automating nutrition coaching lack the personalization needed to address these diverse challenges. This paper introduces a novel LLM-powered agentic workflow designed…
▽ More
Effective management of cardiometabolic conditions requires sustained positive nutrition habits, often hindered by complex and individualized barriers. Direct human management is simply not scalable, while previous attempts aimed at automating nutrition coaching lack the personalization needed to address these diverse challenges. This paper introduces a novel LLM-powered agentic workflow designed to provide personalized nutrition coaching by directly targeting and mitigating patient-specific barriers. Grounded in behavioral science principles, the workflow leverages a comprehensive mapping of nutrition-related barriers to corresponding evidence-based strategies. A specialized LLM agent intentionally probes for and identifies the root cause of a patient's dietary struggles. Subsequently, a separate LLM agent delivers tailored tactics designed to overcome those specific barriers with patient context. We designed and validated our approach through a user study with individuals with cardiometabolic conditions, demonstrating the system's ability to accurately identify barriers and provide personalized guidance. Furthermore, we conducted a large-scale simulation study, grounding on real patient vignettes and expert-validated metrics, to evaluate the system's performance across a wide range of scenarios. Our findings demonstrate the potential of this LLM-powered agentic workflow to improve nutrition coaching by providing personalized, scalable, and behaviorally-informed interventions.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation
Authors:
Eric Yang,
Jonathan Amar,
Jong Ha Lee,
Bhawesh Kumar,
Yugang Jia
Abstract:
Digital health chatbots powered by Large Language Models (LLMs) have the potential to significantly improve personal health management for chronic conditions by providing accessible and on-demand health coaching and question-answering. However, these chatbots risk providing unverified and inaccurate information because LLMs generate responses based on patterns learned from diverse internet data. R…
▽ More
Digital health chatbots powered by Large Language Models (LLMs) have the potential to significantly improve personal health management for chronic conditions by providing accessible and on-demand health coaching and question-answering. However, these chatbots risk providing unverified and inaccurate information because LLMs generate responses based on patterns learned from diverse internet data. Retrieval Augmented Generation (RAG) can help mitigate hallucinations and inaccuracies in LLM responses by grounding it on reliable content. However, efficiently and accurately retrieving most relevant set of content for real-time user questions remains a challenge. In this work, we introduce Query-Based Retrieval Augmented Generation (QB-RAG), a novel approach that pre-computes a database of potential queries from a content base using LLMs. For an incoming patient question, QB-RAG efficiently matches it against this pre-generated query database using vector search, improving alignment between user questions and the content. We establish a theoretical foundation for QB-RAG and provide a comparative analysis of existing retrieval enhancement techniques for RAG systems. Finally, our empirical evaluation demonstrates that QB-RAG significantly improves the accuracy of healthcare question answering, paving the way for robust and trustworthy LLM applications in digital health.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Selective Fine-tuning on LLM-labeled Data May Reduce Reliance on Human Annotation: A Case Study Using Schedule-of-Event Table Detection
Authors:
Bhawesh Kumar,
Jonathan Amar,
Eric Yang,
Nan Li,
Yugang Jia
Abstract:
Large Language Models (LLMs) have demonstrated their efficacy across a broad spectrum of tasks in healthcare applications. However, often LLMs need to be fine-tuned on task-specific expert annotated data to achieve optimal performance, which can be expensive and time consuming. In this study, we fine-tune PaLM-2 with parameter efficient fine-tuning (PEFT) using noisy labels obtained from gemini-pr…
▽ More
Large Language Models (LLMs) have demonstrated their efficacy across a broad spectrum of tasks in healthcare applications. However, often LLMs need to be fine-tuned on task-specific expert annotated data to achieve optimal performance, which can be expensive and time consuming. In this study, we fine-tune PaLM-2 with parameter efficient fine-tuning (PEFT) using noisy labels obtained from gemini-pro 1.0 for the detection of Schedule-of-Event (SoE) tables, which specify care plan in clinical trial protocols. We introduce a filtering mechanism to select high-confidence labels for this table classification task, thereby reducing the noise in the auto-generated labels. We show that fine-tuned PaLM-2 with those labels achieves performance that exceeds the gemini-pro 1.0 and other LLMs. Furthermore, its performance is close to a PaLM-2 fine-tuned on labels obtained from non-expert annotators. Our results show that leveraging LLM-generated labels through powerful models like gemini-pro can potentially serve as a viable strategy for improving LLM performance through fine-tuning in specialized tasks, particularly in domains where expert annotations are scarce, expensive, or time-consuming to obtain.
△ Less
Submitted 5 August, 2024; v1 submitted 9 May, 2024;
originally announced May 2024.
-
Equitable Restless Multi-Armed Bandits: A General Framework Inspired By Digital Health
Authors:
Jackson A. Killian,
Manish Jain,
Yugang Jia,
Jonathan Amar,
Erich Huang,
Milind Tambe
Abstract:
Restless multi-armed bandits (RMABs) are a popular framework for algorithmic decision making in sequential settings with limited resources. RMABs are increasingly being used for sensitive decisions such as in public health, treatment scheduling, anti-poaching, and -- the motivation for this work -- digital health. For such high stakes settings, decisions must both improve outcomes and prevent disp…
▽ More
Restless multi-armed bandits (RMABs) are a popular framework for algorithmic decision making in sequential settings with limited resources. RMABs are increasingly being used for sensitive decisions such as in public health, treatment scheduling, anti-poaching, and -- the motivation for this work -- digital health. For such high stakes settings, decisions must both improve outcomes and prevent disparities between groups (e.g., ensure health equity). We study equitable objectives for RMABs (ERMABs) for the first time. We consider two equity-aligned objectives from the fairness literature, minimax reward and max Nash welfare. We develop efficient algorithms for solving each -- a water filling algorithm for the former, and a greedy algorithm with theoretically motivated nuance to balance disparate group sizes for the latter. Finally, we demonstrate across three simulation domains, including a new digital health model, that our approaches can be multiple times more equitable than the current state of the art without drastic sacrifices to utility. Our findings underscore our work's urgency as RMABs permeate into systems that impact human and wildlife outcomes. Code is available at https://github.com/google-research/socialgood/tree/equitable-rmab
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
The Second-Price Knapsack Problem: Near-Optimal Real Time Bidding in Internet Advertisement
Authors:
Jonathan Amar,
Nicholas Renegar
Abstract:
In many online advertisement (ad) exchanges, ad slots are each sold via a separate second-price auction. This paper considers the bidder's problem of maximizing the value of ads they purchase in these auctions, subject to budget constraints. This 'second-price knapsack' problem presents challenges when devising a bidding strategy because of the uncertain resource consumption: bidders win if they b…
▽ More
In many online advertisement (ad) exchanges, ad slots are each sold via a separate second-price auction. This paper considers the bidder's problem of maximizing the value of ads they purchase in these auctions, subject to budget constraints. This 'second-price knapsack' problem presents challenges when devising a bidding strategy because of the uncertain resource consumption: bidders win if they bid the highest amount, but pay the second-highest bid, unknown a priori. This is in contrast to the traditional online knapsack problem, where posted prices are revealed when ads arrive, and for which there exists a rich literature of primal and dual algorithms.
The main results of this paper establish general methods for adapting these primal and dual online knapsack selection algorithms to the second-price knapsack problem, where the prices are revealed only after bidding. In particular, a methodology is provided for converting deterministic and randomized knapsack selection algorithms into second-price knapsack bidding strategies, that purchase ads through an equivalent set of criteria and thereby achieve the same competitive guarantees. This shows a connection between the traditional knapsack selection algorithm and second-price auction bidding algorithms, that has not previously been leveraged.
Empirical analysis on real ad exchange data verifies the usefulness of this method, and gives examples where it can outperform state-of-the-art techniques.
△ Less
Submitted 12 March, 2020; v1 submitted 24 October, 2018;
originally announced October 2018.
-
Extinction and Survival in Two-Species Annihilation
Authors:
J. G. Amar,
E. Ben-Naim,
S. M. Davis,
P. L. Krapivsky
Abstract:
We study diffusion-controlled two-species annihilation with a finite number of particles. In this stochastic process, particles move diffusively, and when two particles of opposite type come into contact, the two annihilate. We focus on the behavior in three spatial dimensions and for initial conditions where particles are confined to a compact domain. Generally, one species outnumbers the other,…
▽ More
We study diffusion-controlled two-species annihilation with a finite number of particles. In this stochastic process, particles move diffusively, and when two particles of opposite type come into contact, the two annihilate. We focus on the behavior in three spatial dimensions and for initial conditions where particles are confined to a compact domain. Generally, one species outnumbers the other, and we find that the difference between the number of majority and minority species, which is a conserved quantity, controls the behavior. When the number difference exceeds a critical value, the minority becomes extinct and a finite number of majority particles survive, while below this critical difference, a finite number of particles of both species survive. The critical difference $Δ_c$ grows algebraically with the total initial number of particles $N$, and when $N\gg 1$, the critical difference scales as $Δ_c\sim N^{1/3}$. Furthermore, when the initial concentrations of the two species are equal, the average number of surviving majority and minority particles, $M_+$ and $M_-$, exhibit two distinct scaling behaviors, $M_+\sim N^{1/2}$ and $M_-\sim N^{1/6}$. In contrast, when the initial populations are equal, these two quantities are comparable $M_+\sim M_-\sim N^{1/3}$.
△ Less
Submitted 17 November, 2017;
originally announced November 2017.
-
A new gene co-expression network analysis based on Core Structure Detection (CSD)
Authors:
A-C Brunet,
J-M Azais,
J-M Loubes,
J Amar,
R Burcelin
Abstract:
We propose a novel method to cluster gene networks. Based on a dissimilarity built using correlation structures, we consider networks that connect all the genes based on the strength of their dissimilarity. The large number of genes require the use of the threshold to find sparse structures in the graph. in this work, using the notion of graph coreness, we identify clusters of genes which are cent…
▽ More
We propose a novel method to cluster gene networks. Based on a dissimilarity built using correlation structures, we consider networks that connect all the genes based on the strength of their dissimilarity. The large number of genes require the use of the threshold to find sparse structures in the graph. in this work, using the notion of graph coreness, we identify clusters of genes which are central in the network. Then we estimate a network that has these genes as main hubs. We use this new representation to identify biologically meaningful clusters, and to highlight the importance of the nodes that compose the core structures based on biological interpretations.
△ Less
Submitted 6 July, 2016;
originally announced July 2016.
-
Effects of cluster diffusion on the island density and size distribution in submonolayer island growth
Authors:
Yevgen A. Kryukov,
Jacques G. Amar
Abstract:
The effects of cluster diffusion on the submonolayer island density and island-size distribution are studied for the case of irreversible growth of compact islands on a 2D substrate. In our model, we assume instantaneous coalescence of circular islands, while the cluster mobility is assumed to exhibit power-law decay as a function of island-size with exponent mu. Results are presented for mu = 1/2…
▽ More
The effects of cluster diffusion on the submonolayer island density and island-size distribution are studied for the case of irreversible growth of compact islands on a 2D substrate. In our model, we assume instantaneous coalescence of circular islands, while the cluster mobility is assumed to exhibit power-law decay as a function of island-size with exponent mu. Results are presented for mu = 1/2, 1, and 3/2 corresponding to cluster diffusion via Brownian motion, correlated evaporation-condensation, and edge-diffusion respectively, as well as for higher values including mu = 2,3, and 6. We also compare our results with those obtained in the limit of no cluster mobility corresponding to mu = infinity. In agreement with theoretical predictions of power-law behavior of the island-size distribution (ISD) for mu < 1, for mu = 1/2 we find Ns(θ) ~ s^{-τ} (where Ns(θ) is the number of islands of size s at coverage θ) up to a cross-over island-size S_c. However, the value of τ obtained in our simulations is higher than the mean-field (MF) prediction τ = (3 - mu)/2. Similarly, the value of the exponent ζ corresponding to the dependence of S_c on the average island-size S (e.g. S_c ~ S^ζ) is also significantly higher than the MF prediction ζ = 2/(mu+1). A generalized scaling form for the ISD is also proposed for mu < 1, and using this form excellent scaling is found for mu = 1/2. However, for finite mu >= 1 neither the generalized scaling form nor the standard scaling form Ns(θ) = θ /S^2 f(s/S) lead to scaling of the entire ISD for finite values of the ratio R of the monomer diffusion rate to deposition flux. Instead, the scaled ISD becomes more sharply peaked with increasing R and coverage. This is in contrast to models of epitaxial growth with limited cluster mobility for which good scaling occurs over a wide range of coverages.
△ Less
Submitted 13 March, 2011;
originally announced March 2011.
-
Effects of shadowing in oblique-incidence metal(100) epitaxial growth
Authors:
Yunsic Shim,
Jacques G. Amar
Abstract:
The effects of shadowing in oblique incidence metal (100) epitaxial growth are studied using a simplified model. We find that many of the features observed in Cu(100) growth, including the existence of a transition from anisotropic mounds to ripples perpendicular to the beam, can be explained purely by geometrical effects. We also show that the formation of (111) facets is crucial to the develop…
▽ More
The effects of shadowing in oblique incidence metal (100) epitaxial growth are studied using a simplified model. We find that many of the features observed in Cu(100) growth, including the existence of a transition from anisotropic mounds to ripples perpendicular to the beam, can be explained purely by geometrical effects. We also show that the formation of (111) facets is crucial to the development of ripples at large angles of incidence. A second transition to `rods' with (111) facets oriented parallel to the beam is also found at high deposition angles and film thicknesses.
△ Less
Submitted 9 August, 2006;
originally announced August 2006.
-
Synchronous relaxation algorithm for parallel kinetic Monte Carlo
Authors:
Yunsic Shim,
Jacques G. Amar
Abstract:
We investigate the applicability of the synchronous relaxation (SR) algorithm to parallel kinetic Monte Carlo simulations of simple models of thin-film growth. A variety of techniques for optimizing the parallel efficiency are also presented. We find that the parallel efficiency is determined by three main factors $-$ the calculation overhead due to relaxation iterations to correct boundary even…
▽ More
We investigate the applicability of the synchronous relaxation (SR) algorithm to parallel kinetic Monte Carlo simulations of simple models of thin-film growth. A variety of techniques for optimizing the parallel efficiency are also presented. We find that the parallel efficiency is determined by three main factors $-$ the calculation overhead due to relaxation iterations to correct boundary events in neighboring processors, the (extreme) fluctuations in the number of events per cycle in each processor, and the overhead due to interprocessor communications. Due to the existence of fluctuations and the requirement of global synchronization, the SR algorithm does not scale, i.e. the parallel efficiency decreases logarithmically as the number of processors increases. The dependence of the parallel efficiency on simulation parameters such as the processor size, domain decomposition geometry, and the ratio $D/F$ of the monomer hopping rate $D$ to the deposition rate $F$ is also discussed.
△ Less
Submitted 22 June, 2004;
originally announced June 2004.
-
Synchronous sublattice algorithm for parallel kinetic Monte Carlo
Authors:
Yunsic Shim,
Jacques G. Amar
Abstract:
The standard kinetic Monte Carlo algorithm is an extremely efficient method to carry out serial simulations of dynamical processes such as thin-film growth. However, in some cases it is necessary to study systems over extended time and length scales, and therefore a parallel algorithm is desired. Here we describe an efficient, semi-rigorous synchronous sublattice algorithm for parallel kinetic M…
▽ More
The standard kinetic Monte Carlo algorithm is an extremely efficient method to carry out serial simulations of dynamical processes such as thin-film growth. However, in some cases it is necessary to study systems over extended time and length scales, and therefore a parallel algorithm is desired. Here we describe an efficient, semi-rigorous synchronous sublattice algorithm for parallel kinetic Monte Carlo simulations. The accuracy and parallel efficiency are studied as a function of diffusion rate, processor size, and number of processors for a variety of simple models of epitaxial growth. The effects of fluctuations on the parallel efficiency are also studied. Since only local communications are required, linear scaling behavior is observed, e.g. the parallel efficiency is independent of the number of processors for fixed processor size.
△ Less
Submitted 24 June, 2004; v1 submitted 16 June, 2004;
originally announced June 2004.
-
Asymptotic Capture-Number and Island-Size Distributions for One-Dimensional Irreversible Submonolayer Growth
Authors:
J. G. Amar,
M. N. Popescu
Abstract:
Using a set of evolution equations [J.G. Amar {\it et al}, Phys. Rev. Lett. {\bf 86}, 3092 (2001)] for the average gap-size between islands, we calculate analytically the asymptotic scaled capture-number distribution (CND) for one-dimensional irreversible submonolayer growth of point islands. The predicted asymptotic CND is in reasonably good agreement with kinetic Monte-Carlo (KMC) results and…
▽ More
Using a set of evolution equations [J.G. Amar {\it et al}, Phys. Rev. Lett. {\bf 86}, 3092 (2001)] for the average gap-size between islands, we calculate analytically the asymptotic scaled capture-number distribution (CND) for one-dimensional irreversible submonolayer growth of point islands. The predicted asymptotic CND is in reasonably good agreement with kinetic Monte-Carlo (KMC) results and leads to a \textit{non-divergent asymptotic} scaled island-size distribution (ISD). We then show that a slight modification of our analytical form leads to an analytic expression for the asymptotic CND and a resulting asymptotic ISD which are in excellent agreement with KMC simulations. We also show that in the asymptotic limit the self-averaging property of the capture zones holds exactly while the asymptotic scaled gap distribution is equal to the scaled CND.
△ Less
Submitted 11 July, 2003;
originally announced July 2003.