Skip to main content

Showing 1–11 of 11 results for author: Liden, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.13130  [pdf, other

    cs.CV cs.AI cs.HC cs.LG cs.RO

    Magma: A Foundation Model for Multimodal AI Agents

    Authors: Jianwei Yang, Reuben Tan, Qianhui Wu, Ruijie Zheng, Baolin Peng, Yongyuan Liang, Yu Gu, Mu Cai, Seonghyeon Ye, Joel Jang, Yuquan Deng, Lars Liden, Jianfeng Gao

    Abstract: We present Magma, a foundation model that serves multimodal AI agentic tasks in both the digital and physical worlds. Magma is a significant extension of vision-language (VL) models in that it not only retains the VL understanding ability (verbal intelligence) of the latter, but is also equipped with the ability to plan and act in the visual-spatial world (spatial-temporal intelligence) and comple… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: 29 pages, 16 figures, technical report from MSR

  2. arXiv:2410.11758  [pdf, other

    cs.RO cs.CL cs.CV cs.LG

    Latent Action Pretraining from Videos

    Authors: Seonghyeon Ye, Joel Jang, Byeongguk Jeon, Sejune Joo, Jianwei Yang, Baolin Peng, Ajay Mandlekar, Reuben Tan, Yu-Wei Chao, Bill Yuchen Lin, Lars Liden, Kimin Lee, Jianfeng Gao, Luke Zettlemoyer, Dieter Fox, Minjoon Seo

    Abstract: We introduce Latent Action Pretraining for general Action models (LAPA), an unsupervised method for pretraining Vision-Language-Action (VLA) models without ground-truth robot action labels. Existing Vision-Language-Action models require action labels typically collected by human teleoperators during pretraining, which significantly limits possible data sources and scale. In this work, we propose a… ▽ More

    Submitted 15 May, 2025; v1 submitted 15 October, 2024; originally announced October 2024.

    Comments: ICLR 2025 Website: https://latentactionpretraining.github.io

  3. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Weizhu Chen, Yen-Chun Chen, Yi-Ling Chen, Hao Cheng, Parul Chopra, Xiyang Dai , et al. (104 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version… ▽ More

    Submitted 30 August, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 24 pages

  4. arXiv:2303.00915  [pdf, other

    cs.CV cs.CL

    BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

    Authors: Sheng Zhang, Yanbo Xu, Naoto Usuyama, Hanwen Xu, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Naveen Valluri, Cliff Wong, Andrea Tupini, Yu Wang, Matt Mazzola, Swadheen Shukla, Lars Liden, Jianfeng Gao, Angela Crabtree, Brian Piening, Carlo Bifulco, Matthew P. Lungren, Tristan Naumann, Sheng Wang, Hoifung Poon

    Abstract: Biomedical data is inherently multimodal, comprising physical measurements and natural language narratives. A generalist biomedical AI model needs to simultaneously process different modalities of data, including text and images. Therefore, training an effective generalist biomedical model requires high-quality multimodal data, such as parallel image-text pairs. Here, we present PMC-15M, a novel d… ▽ More

    Submitted 8 January, 2025; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: The models are released at https://aka.ms/biomedclip

  5. arXiv:2302.12813  [pdf, other

    cs.CL cs.AI

    Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

    Authors: Baolin Peng, Michel Galley, Pengcheng He, Hao Cheng, Yujia Xie, Yu Hu, Qiuyuan Huang, Lars Liden, Zhou Yu, Weizhu Chen, Jianfeng Gao

    Abstract: Large language models (LLMs), such as ChatGPT, are able to generate human-like, fluent responses for many downstream tasks, e.g., task-oriented dialog and question answering. However, applying LLMs to real-world, mission-critical applications remains challenging mainly due to their tendency to generate hallucinations and their inability to use external knowledge. This paper proposes a LLM-Augmente… ▽ More

    Submitted 8 March, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

    Comments: 15 pages

  6. arXiv:2206.11309  [pdf, other

    cs.CL

    GODEL: Large-Scale Pre-Training for Goal-Directed Dialog

    Authors: Baolin Peng, Michel Galley, Pengcheng He, Chris Brockett, Lars Liden, Elnaz Nouri, Zhou Yu, Bill Dolan, Jianfeng Gao

    Abstract: We introduce GODEL (Grounded Open Dialogue Language Model), a large pre-trained language model for dialog. In contrast with earlier models such as DialoGPT, GODEL leverages a new phase of grounded pre-training designed to better support adapting GODEL to a wide range of downstream dialog tasks that require information external to the current conversation (e.g., a database or document) to produce g… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

  7. arXiv:2112.07924  [pdf, other

    cs.CL

    Knowledge-Grounded Dialogue Generation with a Unified Knowledge Representation

    Authors: Yu Li, Baolin Peng, Yelong Shen, Yi Mao, Lars Liden, Zhou Yu, Jianfeng Gao

    Abstract: Knowledge-grounded dialogue systems are challenging to build due to the lack of training data and heterogeneous knowledge sources. Existing systems perform poorly on unseen topics due to limited topics covered in the training data. In addition, heterogeneous knowledge sources make it challenging for systems to generalize to other tasks because knowledge sources in different knowledge representatio… ▽ More

    Submitted 31 July, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: Accepted to NAACL 2022

  8. arXiv:2011.06486  [pdf, ps, other

    cs.CL

    Overview of the Ninth Dialog System Technology Challenge: DSTC9

    Authors: Chulaka Gunasekara, Seokhwan Kim, Luis Fernando D'Haro, Abhinav Rastogi, Yun-Nung Chen, Mihail Eric, Behnam Hedayatnia, Karthik Gopalakrishnan, Yang Liu, Chao-Wei Huang, Dilek Hakkani-Tür, Jinchao Li, Qi Zhu, Lingxiao Luo, Lars Liden, Kaili Huang, Shahin Shayandeh, Runze Liang, Baolin Peng, Zheng Zhang, Swadheen Shukla, Minlie Huang, Jianfeng Gao, Shikib Mehri, Yulan Feng , et al. (14 additional authors not shown)

    Abstract: This paper introduces the Ninth Dialog System Technology Challenge (DSTC-9). This edition of the DSTC focuses on applying end-to-end dialog technologies for four distinct tasks in dialog systems, namely, 1. Task-oriented dialog Modeling with unstructured knowledge access, 2. Multi-domain task-oriented dialog, 3. Interactive evaluation of dialog, and 4. Situated interactive multi-modal dialog. This… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

  9. arXiv:2009.03457  [pdf, other

    cs.AI cs.CL

    Robust Conversational AI with Grounded Text Generation

    Authors: Jianfeng Gao, Baolin Peng, Chunyuan Li, Jinchao Li, Shahin Shayandeh, Lars Liden, Heung-Yeung Shum

    Abstract: This article presents a hybrid approach based on a Grounded Text Generation (GTG) model to building robust task bots at scale. GTG is a hybrid model which uses a large-scale Transformer neural network as its backbone, combined with symbol-manipulation modules for knowledge base inference and prior knowledge encoding, to generate responses grounded in dialog belief state and real-world knowledge fo… ▽ More

    Submitted 7 September, 2020; originally announced September 2020.

  10. arXiv:2005.05298  [pdf, other

    cs.CL cs.AI cs.LG

    SOLOIST: Building Task Bots at Scale with Transfer Learning and Machine Teaching

    Authors: Baolin Peng, Chunyuan Li, Jinchao Li, Shahin Shayandeh, Lars Liden, Jianfeng Gao

    Abstract: We present a new method SOLOIST that uses transfer learning and machine teaching to build task bots at scale. We parameterize classical modular task-oriented dialog systems using a Transformer-based auto-regressive language model, which subsumes different dialog modules into a single neural model. We pre-train, on heterogeneous dialog corpora, a task-grounded response generation model, which can g… ▽ More

    Submitted 8 April, 2021; v1 submitted 11 May, 2020; originally announced May 2020.

    Comments: 18 pages; To appear at TACL; Project Website: https://aka.ms/soloist

  11. arXiv:2004.04305  [pdf, other

    cs.CL cs.AI

    Conversation Learner -- A Machine Teaching Tool for Building Dialog Managers for Task-Oriented Dialog Systems

    Authors: Swadheen Shukla, Lars Liden, Shahin Shayandeh, Eslam Kamal, Jinchao Li, Matt Mazzola, Thomas Park, Baolin Peng, Jianfeng Gao

    Abstract: Traditionally, industry solutions for building a task-oriented dialog system have relied on helping dialog authors define rule-based dialog managers, represented as dialog flows. While dialog flows are intuitively interpretable and good for simple scenarios, they fall short of performance in terms of the flexibility needed to handle complex dialogs. On the other hand, purely machine-learned models… ▽ More

    Submitted 1 May, 2020; v1 submitted 8 April, 2020; originally announced April 2020.

    Comments: Accepted to ACL 2020 Demonstration Track