Skip to main content

Showing 1–3 of 3 results for author: Glória-Silva, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.19074  [pdf, other

    cs.CV cs.CL

    Show and Guide: Instructional-Plan Grounded Vision and Language Model

    Authors: Diogo Glória-Silva, David Semedo, João Magalhães

    Abstract: Guiding users through complex procedural plans is an inherently multimodal task in which having visually illustrated plan steps is crucial to deliver an effective plan guidance. However, existing works on plan-following language models (LMs) often are not capable of multimodal input and output. In this work, we present MM-PlanLLM, the first multimodal LLM designed to assist users in executing inst… ▽ More

    Submitted 18 October, 2024; v1 submitted 27 September, 2024; originally announced September 2024.

    Comments: Accepted at EMNLP 2024 Main Track

  2. arXiv:2405.10122  [pdf, other

    cs.CV

    Generating Coherent Sequences of Visual Illustrations for Real-World Manual Tasks

    Authors: João Bordalo, Vasco Ramos, Rodrigo Valério, Diogo Glória-Silva, Yonatan Bitton, Michal Yarom, Idan Szpektor, Joao Magalhaes

    Abstract: Multistep instructions, such as recipes and how-to guides, greatly benefit from visual aids, such as a series of images that accompany the instruction steps. While Large Language Models (LLMs) have become adept at generating coherent textual steps, Large Vision/Language Models (LVLMs) are less capable of generating accompanying image sequences. The most challenging aspect is that each generated im… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  3. arXiv:2402.01053  [pdf, other

    cs.CL cs.AI

    Plan-Grounded Large Language Models for Dual Goal Conversational Settings

    Authors: Diogo Glória-Silva, Rafael Ferreira, Diogo Tavares, David Semedo, João Magalhães

    Abstract: Training Large Language Models (LLMs) to follow user instructions has been shown to supply the LLM with ample capacity to converse fluently while being aligned with humans. Yet, it is not completely clear how an LLM can lead a plan-grounded conversation in mixed-initiative settings where instructions flow in both directions of the conversation, i.e. both the LLM and the user provide instructions t… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.