Skip to main content

Showing 1–23 of 23 results for author: Eshghi, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.20200  [pdf, other

    cs.CL

    Reasoning or a Semblance of it? A Diagnostic Study of Transitive Reasoning in LLMs

    Authors: Houman Mehrafarin, Arash Eshghi, Ioannis Konstas

    Abstract: Evaluating Large Language Models (LLMs) on reasoning benchmarks demonstrates their ability to solve compositional questions. However, little is known of whether these models engage in genuine logical reasoning or simply rely on implicit cues to generate answers. In this paper, we investigate the transitive reasoning capabilities of two distinct LLM architectures, LLaMA 2 and Flan-T5, by manipulati… ▽ More

    Submitted 26 October, 2024; originally announced October 2024.

    Comments: To appear in EMNLP Main 2024

  2. arXiv:2409.14247  [pdf, other

    cs.CL cs.HC

    Repairs in a Block World: A New Benchmark for Handling User Corrections with Multi-Modal Language Models

    Authors: Javier Chiyah-Garcia, Alessandro Suglia, Arash Eshghi

    Abstract: In dialogue, the addressee may initially misunderstand the speaker and respond erroneously, often prompting the speaker to correct the misunderstanding in the next turn with a Third Position Repair (TPR). The ability to process and respond appropriately to such repair sequences is thus crucial in conversational AI systems. In this paper, we first collect, analyse, and publicly release BlockWorld-R… ▽ More

    Submitted 4 October, 2024; v1 submitted 21 September, 2024; originally announced September 2024.

    Comments: Accepted to EMNLP'24 Main (Upcoming). Data and code at www.github.com/JChiyah/blockworld-repairs - for Bibtex see https://raw.githubusercontent.com/JChiyah/blockworld-repairs/refs/heads/main/citation.bib

  3. arXiv:2409.05395  [pdf, other

    cs.CV cs.AI cs.LG

    Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling

    Authors: Georgios Pantazopoulos, Malvina Nikandrou, Alessandro Suglia, Oliver Lemon, Arash Eshghi

    Abstract: This study explores replacing Transformers in Visual Language Models (VLMs) with Mamba, a recent structured state space model (SSM) that demonstrates promising performance in sequence modeling. We test models up to 3B parameters under controlled conditions, showing that Mamba-based VLMs outperforms Transformers-based VLMs in captioning, question answering, and reading comprehension. However, we fi… ▽ More

    Submitted 1 October, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

  4. arXiv:2406.13807  [pdf, other

    cs.CV cs.AI cs.CL

    AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding

    Authors: Alessandro Suglia, Claudio Greco, Katie Baker, Jose L. Part, Ioannis Papaioannou, Arash Eshghi, Ioannis Konstas, Oliver Lemon

    Abstract: AI personal assistants deployed via robots or wearables require embodied understanding to collaborate with humans effectively. However, current Vision-Language Models (VLMs) primarily focus on third-person view videos, neglecting the richness of egocentric perceptual experience. To address this gap, we propose three key contributions. First, we introduce the Egocentric Video Understanding Dataset… ▽ More

    Submitted 21 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: Code available https://github.com/alanaai/EVUD

  5. arXiv:2404.13594  [pdf, other

    cs.CV cs.AI

    Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers

    Authors: Georgios Pantazopoulos, Alessandro Suglia, Oliver Lemon, Arash Eshghi

    Abstract: An effective method for combining frozen large language models (LLM) and visual encoders involves a resampler module that creates a `visual prompt' which is provided to the LLM, along with the textual prompt. While this approach has enabled impressive performance across many coarse-grained tasks like image captioning and visual question answering, more fine-grained tasks that require spatial under… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: NAACL 2024

  6. arXiv:2311.04067  [pdf, other

    cs.LG cs.AI cs.CV

    Multitask Multimodal Prompted Training for Interactive Embodied Task Completion

    Authors: Georgios Pantazopoulos, Malvina Nikandrou, Amit Parekh, Bhathiya Hemanthage, Arash Eshghi, Ioannis Konstas, Verena Rieser, Oliver Lemon, Alessandro Suglia

    Abstract: Interactive and embodied tasks pose at least two fundamental challenges to existing Vision & Language (VL) models, including 1) grounding language in trajectories of actions and observations, and 2) referential disambiguation. To tackle these challenges, we propose an Embodied MultiModal Agent (EMMA): a unified encoder-decoder model that reasons over images and trajectories, and casts action predi… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023

  7. arXiv:2308.11683  [pdf, other

    cs.CL

    Learning to generate and corr- uh I mean repair language in real-time

    Authors: Arash Eshghi, Arash Ashrafzadeh

    Abstract: In conversation, speakers produce language incrementally, word by word, while continuously monitoring the appropriateness of their own contribution in the dynamically unfolding context of the conversation; and this often leads them to repair their own utterance on the fly. This real-time language processing capacity is furthermore crucial to the development of fluent and natural conversational AI.… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: Proceedings of the workshop on the Semantics and Pragmatics of Dialogue, SemDial, Maribor, Slovenia (2023)

  8. arXiv:2307.16689  [pdf, other

    cs.CL

    No that's not what I meant: Handling Third Position Repair in Conversational Question Answering

    Authors: Vevake Balaraman, Arash Eshghi, Ioannis Konstas, Ioannis Papaioannou

    Abstract: The ability to handle miscommunication is crucial to robust and faithful conversational AI. People usually deal with miscommunication immediately as they detect it, using highly systematic interactional mechanisms called repair. One important type of repair is Third Position Repair (TPR) whereby a speaker is initially misunderstood but then corrects the misunderstanding as it becomes apparent afte… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: Accepted at SIGDIAL'23

  9. arXiv:2307.15554  [pdf, other

    cs.CL

    'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational Exchanges

    Authors: Javier Chiyah-Garcia, Alessandro Suglia, Arash Eshghi, Helen Hastie

    Abstract: Referential ambiguities arise in dialogue when a referring expression does not uniquely identify the intended referent for the addressee. Addressees usually detect such ambiguities immediately and work with the speaker to repair it using meta-communicative, Clarificational Exchanges (CE): a Clarification Request (CR) and a response. Here, we argue that the ability to generate and respond to CRs im… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: Accepted at SIGDIAL'23 (upcoming). Repository with code and experiments available at https://github.com/JChiyah/what-are-you-referring-to

  10. arXiv:2305.16519  [pdf, other

    cs.CL

    The Dangers of trusting Stochastic Parrots: Faithfulness and Trust in Open-domain Conversational Question Answering

    Authors: Sabrina Chiesurin, Dimitris Dimakopoulos, Marco Antonio Sobrevilla Cabezudo, Arash Eshghi, Ioannis Papaioannou, Verena Rieser, Ioannis Konstas

    Abstract: Large language models are known to produce output which sounds fluent and convincing, but is also often wrong, e.g. "unfaithful" with respect to a rationale as retrieved from a knowledge base. In this paper, we show that task-based systems which exhibit certain advanced linguistic dialog behaviors, such as lexical alignment (repeating what the user said), are in fact preferred and trusted more, wh… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: 5 pages, ACL Findings 2023

  11. arXiv:2202.12645  [pdf, other

    cs.CL cs.AI

    Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge

    Authors: Javier Chiyah-Garcia, Alessandro Suglia, José Lopes, Arash Eshghi, Helen Hastie

    Abstract: Anaphoric expressions, such as pronouns and referential descriptions, are situated with respect to the linguistic context of prior turns, as well as, the immediate visual environment. However, a speaker's referential descriptions do not always uniquely identify the referent, leading to ambiguities in need of resolution through subsequent clarificational exchanges. Thus, effective Ambiguity Detecti… ▽ More

    Submitted 26 July, 2023; v1 submitted 25 February, 2022; originally announced February 2022.

    Comments: Accepted to AAAI 2022 DSTC10 Workshop

  12. arXiv:2103.08545  [pdf, other

    cs.CL cs.AI

    A Study of Automatic Metrics for the Evaluation of Natural Language Explanations

    Authors: Miruna Clinciu, Arash Eshghi, Helen Hastie

    Abstract: As transparency becomes key for robotics and AI, it will be necessary to evaluate the methods through which transparency is provided, including automatically generated natural language (NL) explanations. Here, we explore parallels between the generation of such explanations and the much-studied field of evaluation of Natural Language Generation (NLG). Specifically, we investigate which of the NLG… ▽ More

    Submitted 15 March, 2021; originally announced March 2021.

    Comments: Accepted at EACL 2021

    Report number: 2021.eacl-main.202

  13. arXiv:1910.01302  [pdf, other

    cs.CL

    Data-Efficient Goal-Oriented Conversation with Dialogue Knowledge Transfer Networks

    Authors: Igor Shalyminov, Sungjin Lee, Arash Eshghi, Oliver Lemon

    Abstract: Goal-oriented dialogue systems are now being widely adopted in industry where it is of key importance to maintain a rapid prototyping cycle for new products and domains. Data-driven dialogue system development has to be adapted to meet this requirement --- therefore, reducing the amount of data and annotations necessary for training such systems is a central research problem. In this paper, we p… ▽ More

    Submitted 3 October, 2019; originally announced October 2019.

    Comments: EMNLP 2019

    ACM Class: I.2.7

  14. arXiv:1909.06644  [pdf, ps, other

    cs.CL

    Current Challenges in Spoken Dialogue Systems and Why They Are Critical for Those Living with Dementia

    Authors: Angus Addlesee, Arash Eshghi, Ioannis Konstas

    Abstract: Dialogue technologies such as Amazon's Alexa have the potential to transform the healthcare industry. However, current systems are not yet naturally interactive: they are often turn-based, have naive end-of-turn detection and completely ignore many types of verbal and visual feedback - such as backchannels, hesitation markers, filled pauses, gaze, brow furrows and disfluencies - that are crucial i… ▽ More

    Submitted 14 September, 2019; originally announced September 2019.

    Comments: Published at Dialog for Good 2019 - Workshop on Speech and Language Technology Serving Society

    Journal ref: Dialog for Good (2019)

  15. arXiv:1908.05854  [pdf, other

    cs.CL

    Few-Shot Dialogue Generation Without Annotated Data: A Transfer Learning Approach

    Authors: Igor Shalyminov, Sungjin Lee, Arash Eshghi, Oliver Lemon

    Abstract: Learning with minimal data is one of the key challenges in the development of practical, production-ready goal-oriented dialogue systems. In a real-world enterprise setting where dialogue systems are developed rapidly and are expected to work robustly for an ever-growing variety of domains, products, and scenarios, efficient learning from a limited number of examples becomes indispensable. In th… ▽ More

    Submitted 16 August, 2019; originally announced August 2019.

    Comments: Accepted at SigDial 2019

    ACM Class: I.2.7

  16. arXiv:1903.05566  [pdf, ps, other

    cs.CL cs.LG

    Benchmarking Natural Language Understanding Services for building Conversational Agents

    Authors: Xingkun Liu, Arash Eshghi, Pawel Swietojanski, Verena Rieser

    Abstract: We have recently seen the emergence of several publicly available Natural Language Understanding (NLU) toolkits, which map user utterances to structured, but more abstract, Dialogue Act (DA) or Intent specifications, while making this process accessible to the lay developer. In this paper, we present the first wide coverage evaluation and comparison of some of the most popular NLU services, on a l… ▽ More

    Submitted 26 March, 2019; v1 submitted 13 March, 2019; originally announced March 2019.

    Comments: Accepted by IWSDS2019

  17. arXiv:1810.03352  [pdf, other

    cs.CL

    Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems

    Authors: Igor Shalyminov, Arash Eshghi, Oliver Lemon

    Abstract: Spontaneous spoken dialogue is often disfluent, containing pauses, hesitations, self-corrections and false starts. Processing such phenomena is essential in understanding a speaker's intended meaning and controlling the flow of the conversation. Furthermore, this processing needs to be word-by-word incremental to allow further downstream processing to begin as early as possible in order to handle… ▽ More

    Submitted 8 October, 2018; originally announced October 2018.

    Comments: 9 pages, 1 figure, 7 tables. Accepted as a full paper for SemDial 2018

    ACM Class: I.2.7

  18. arXiv:1709.10431  [pdf, other

    cs.CL cs.AI cs.LG cs.RO

    The BURCHAK corpus: a Challenge Data Set for Interactive Learning of Visually Grounded Word Meanings

    Authors: Yanchao Yu, Arash Eshghi, Gregory Mills, Oliver Joseph Lemon

    Abstract: We motivate and describe a new freely available human-human dialogue dataset for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner. The data has been collected using a novel, character-by-character variant of the DiET chat tool (Healey et al., 2003; Mills and Healey, submitted) with a novel task, where a Learner needs to learn invented vis… ▽ More

    Submitted 29 September, 2017; originally announced September 2017.

    Comments: 10 pages, THE 6TH WORKSHOP ON VISION AND LANGUAGE (VL'17)

  19. arXiv:1709.10426  [pdf, other

    cs.CL cs.AI cs.LG cs.RO

    Training an adaptive dialogue policy for interactive learning of visually grounded word meanings

    Authors: Yanchao Yu, Arash Eshghi, Oliver Lemon

    Abstract: We present a multi-modal dialogue system for interactive learning of perceptually grounded word meanings from a human tutor. The system integrates an incremental, semantic parsing/generation framework - Dynamic Syntax and Type Theory with Records (DS-TTR) - with a set of visual classifiers that are learned throughout the interaction and which ground the meaning representations that it produces. We… ▽ More

    Submitted 29 September, 2017; originally announced September 2017.

    Comments: 11 pages, SIGDIAL 2016 Conference

  20. arXiv:1709.10423  [pdf, other

    cs.CL cs.AI cs.LG cs.RO

    Learning how to learn: an adaptive dialogue agent for incrementally learning visually grounded word meanings

    Authors: Yanchao Yu, Arash Eshghi, Oliver Lemon

    Abstract: We present an optimised multi-modal dialogue agent for interactive learning of visually grounded word meanings from a human tutor, trained on real human-human tutoring data. Within a life-long interactive learning period, the agent, trained using Reinforcement Learning (RL), must be able to handle natural conversations with human users and achieve good learning performance (accuracy) while minimis… ▽ More

    Submitted 29 September, 2017; originally announced September 2017.

    Comments: 10 pages, RoboNLP Workshop from ACL Conference

  21. arXiv:1709.07858  [pdf, other

    cs.CL

    Bootstrapping incremental dialogue systems from minimal data: the generalisation power of dialogue grammars

    Authors: Arash Eshghi, Igor Shalyminov, Oliver Lemon

    Abstract: We investigate an end-to-end method for automatically inducing task-based dialogue systems from small amounts of unannotated dialogue data. It combines an incremental semantic grammar - Dynamic Syntax and Type Theory with Records (DS-TTR) - with Reinforcement Learning (RL), where language generation and dialogue management are a joint decision problem. The systems thus produced are incremental: di… ▽ More

    Submitted 22 September, 2017; originally announced September 2017.

    Comments: 11 pages, 4 figures, 2 tables. Accepted as a long paper for EMNLP 2017

    ACM Class: I.2.7

    Journal ref: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (ISBN 978-1-945626-83-8), pp 2210-2220. Copenhagen, Denmark September 7-11, 2017

  22. arXiv:1709.07840  [pdf, other

    cs.CL

    Challenging Neural Dialogue Models with Natural Data: Memory Networks Fail on Incremental Phenomena

    Authors: Igor Shalyminov, Arash Eshghi, Oliver Lemon

    Abstract: Natural, spontaneous dialogue proceeds incrementally on a word-by-word basis; and it contains many sorts of disfluency such as mid-utterance/sentence hesitations, interruptions, and self-corrections. But training data for machine learning approaches to dialogue processing is often either cleaned-up or wholly synthetic in order to avoid such phenomena. The question then arises of how well systems t… ▽ More

    Submitted 22 September, 2017; originally announced September 2017.

    Comments: 9 pages, 3 figures, 2 tables. Accepted as a full paper for SemDial 2017

    ACM Class: I.2.7

    Journal ref: Proceedings of the 21st Workshop on the Semantics and Pragmatics of Dialogue (ISSN 2308-2275), pp 125-133. Saarbrucken, Germany, 15-17 August 2017

  23. arXiv:1612.00347  [pdf, other

    cs.CL cs.AI cs.HC

    Bootstrapping incremental dialogue systems: using linguistic knowledge to learn from minimal data

    Authors: Dimitrios Kalatzis, Arash Eshghi, Oliver Lemon

    Abstract: We present a method for inducing new dialogue systems from very small amounts of unannotated dialogue data, showing how word-level exploration using Reinforcement Learning (RL), combined with an incremental and semantic grammar - Dynamic Syntax (DS) - allows systems to discover, generate, and understand many new dialogue variants. The method avoids the use of expensive and time-consuming dialogue… ▽ More

    Submitted 1 December, 2016; originally announced December 2016.