LingoQA: Visual Question Answering for Autonomous Driving

Marcu, Ana-Maria; Chen, Long; Hünermann, Jan; Karnsund, Alice; Hanotte, Benoit; Chidananda, Prajwal; Nair, Saurabh; Badrinarayanan, Vijay; Kendall, Alex; Shotton, Jamie; Arani, Elahe; Sinavski, Oleg

Computer Science > Robotics

arXiv:2312.14115 (cs)

[Submitted on 21 Dec 2023 (v1), last revised 26 Sep 2024 (this version, v4)]

Title:LingoQA: Visual Question Answering for Autonomous Driving

Authors:Ana-Maria Marcu, Long Chen, Jan Hünermann, Alice Karnsund, Benoit Hanotte, Prajwal Chidananda, Saurabh Nair, Vijay Badrinarayanan, Alex Kendall, Jamie Shotton, Elahe Arani, Oleg Sinavski

View PDF HTML (experimental)

Abstract:We introduce LingoQA, a novel dataset and benchmark for visual question answering in autonomous driving. The dataset contains 28K unique short video scenarios, and 419K annotations. Evaluating state-of-the-art vision-language models on our benchmark shows that their performance is below human capabilities, with GPT-4V responding truthfully to 59.6% of the questions compared to 96.6% for humans. For evaluation, we propose a truthfulness classifier, called Lingo-Judge, that achieves a 0.95 Spearman correlation coefficient to human evaluations, surpassing existing techniques like METEOR, BLEU, CIDEr, and GPT-4. We establish a baseline vision-language model and run extensive ablation studies to understand its performance. We release our dataset and benchmark as an evaluation platform for vision-language models in autonomous driving.

Comments:	Accepted to ECCV 2024. Benchmark and dataset are available at this https URL
Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2312.14115 [cs.RO]
	(or arXiv:2312.14115v4 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2312.14115

Submission history

From: Long Chen [view email]
[v1] Thu, 21 Dec 2023 18:40:34 UTC (36,820 KB)
[v2] Wed, 20 Mar 2024 00:23:39 UTC (37,644 KB)
[v3] Wed, 25 Sep 2024 17:14:15 UTC (8,426 KB)
[v4] Thu, 26 Sep 2024 15:30:00 UTC (8,426 KB)

Computer Science > Robotics

Title:LingoQA: Visual Question Answering for Autonomous Driving

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:LingoQA: Visual Question Answering for Autonomous Driving

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators