Skip to main content

Showing 1–6 of 6 results for author: Kauffmann, P

.
  1. arXiv:2504.21318  [pdf, other

    cs.AI cs.CL

    Phi-4-reasoning Technical Report

    Authors: Marah Abdin, Sahaj Agarwal, Ahmed Awadallah, Vidhisha Balachandran, Harkirat Behl, Lingjiao Chen, Gustavo de Rosa, Suriya Gunasekar, Mojan Javaheripi, Neel Joshi, Piero Kauffmann, Yash Lara, Caio César Teodoro Mendes, Arindam Mitra, Besmira Nushi, Dimitris Papailiopoulos, Olli Saarikivi, Shital Shah, Vaishnavi Shrivastava, Vibhav Vineet, Yue Wu, Safoora Yousefi, Guoqing Zheng

    Abstract: We introduce Phi-4-reasoning, a 14-billion parameter reasoning model that achieves strong performance on complex reasoning tasks. Trained via supervised fine-tuning of Phi-4 on carefully curated set of "teachable" prompts-selected for the right level of complexity and diversity-and reasoning demonstrations generated using o3-mini, Phi-4-reasoning generates detailed reasoning chains that effectivel… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  2. arXiv:2412.08905  [pdf, other

    cs.CL cs.AI

    Phi-4 Technical Report

    Authors: Marah Abdin, Jyoti Aneja, Harkirat Behl, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harrison, Russell J. Hewett, Mojan Javaheripi, Piero Kauffmann, James R. Lee, Yin Tat Lee, Yuanzhi Li, Weishung Liu, Caio C. T. Mendes, Anh Nguyen, Eric Price, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Xin Wang, Rachel Ward, Yue Wu, Dingli Yu , et al. (2 additional authors not shown)

    Abstract: We present phi-4, a 14-billion parameter language model developed with a training recipe that is centrally focused on data quality. Unlike most language models, where pre-training is based primarily on organic data sources such as web content or code, phi-4 strategically incorporates synthetic data throughout the training process. While previous models in the Phi family largely distill the capabil… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  3. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Weizhu Chen, Yen-Chun Chen, Yi-Ling Chen, Hao Cheng, Parul Chopra, Xiyang Dai , et al. (104 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version… ▽ More

    Submitted 30 August, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 24 pages

  4. arXiv:2306.11644  [pdf, other

    cs.CL cs.AI cs.LG

    Textbooks Are All You Need

    Authors: Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi Li

    Abstract: We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accu… ▽ More

    Submitted 2 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 26 pages; changed color scheme of plot. fixed minor typos and added couple clarifications

  5. arXiv:1910.12125  [pdf

    physics.flu-dyn physics.ao-ph physics.comp-ph

    Deep learning for subgrid-scale turbulence modeling in large-eddy simulations of the atmospheric boundary layer

    Authors: Yu Cheng, Marco Giometto, Pit Kauffmann, Ling Lin, Chen Cao, Cody Zupnick, Harold Li, Qi Li, Ryan Abernathey, Pierre Gentine

    Abstract: In large-eddy simulations, subgrid-scale (SGS) processes are parameterized as a function of filtered grid-scale variables. First-order, algebraic SGS models are based on the eddy-viscosity assumption, which does not always hold for turbulence. Here we apply supervised deep neural networks (DNNs) to learn SGS stresses from a set of neighboring coarse-grained velocity from direct numerical simulatio… ▽ More

    Submitted 26 October, 2019; originally announced October 2019.

    Comments: 33 pages, 11 figures, 3 tables

  6. arXiv:1812.03963  [pdf, other

    astro-ph.SR astro-ph.EP

    Cloud Atlas: Hubble Space Telescope Near-Infrared Spectral Library of Brown Dwarfs, Planetary-mass companions, and hot Jupiters

    Authors: Elena Manjavacas, Daniel Apai, Yifan Zhou, Ben W. P. Lew, Glenn Schneider, Stan Metchev, Paulo A. Miles-Paez, Jacqueline Radigan, Mark S. Marley, Nicolas Cowan, Theodora Karalidi, Adam J. Burgasser, Luigi R. Bedin, Patrick J. Lowrance, Parker Kauffmann

    Abstract: Bayesian atmospheric retrieval tools can place constraints on the properties of brown dwarfs and hot Jupiters atmospheres. To fully exploit these methods, high signal-to-noise spectral libraries with well-understood uncertainties are essential. We present a high signal-to-noise spectral library (1.10-1.69 microns) of the thermal emission of 76 brown dwarfs and hot Jupiters. All our spectra have be… ▽ More

    Submitted 10 December, 2018; originally announced December 2018.