Asking questions on handwritten document collections

Mathew, Minesh; Gomez, Lluis; Karatzas, Dimosthenis; Jawahar, CV

doi:10.1007/s10032-021-00383-3

Computer Science > Computer Vision and Pattern Recognition

arXiv:2110.00711 (cs)

[Submitted on 2 Oct 2021]

Title:Asking questions on handwritten document collections

Authors:Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas, CV Jawahar

View PDF

Abstract:This work addresses the problem of Question Answering (QA) on handwritten document collections. Unlike typical QA and Visual Question Answering (VQA) formulations where the answer is a short text, we aim to locate a document snippet where the answer lies. The proposed approach works without recognizing the text in the documents. We argue that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult. At the same time, for human users, document image snippets containing answers act as a valid alternative to textual answers. The proposed approach uses an off-the-shelf deep embedding network which can project both textual words and word images into a common sub-space. This embedding bridges the textual and visual domains and helps us retrieve document snippets that potentially answer a question. We evaluate results of the proposed approach on two new datasets: (i) HW-SQuAD: a synthetic, handwritten document image counterpart of SQuAD1.0 dataset and (ii) BenthamQA: a smaller set of QA pairs defined on documents from the popular Bentham manuscripts collection. We also present a thorough analysis of the proposed recognition-free approach compared to a recognition-based approach which uses text recognized from the images using an OCR. Datasets presented in this work are available to download at this http URL

Comments:	pre-print version
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2110.00711 [cs.CV]
	(or arXiv:2110.00711v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2110.00711
Journal reference:	journal = {Int. J. Document Anal. Recognit.}, volume = {24}, number = {3}, pages = {235--249}, year = {2021}
Related DOI:	https://doi.org/10.1007/s10032-021-00383-3

Submission history

From: Minesh Mathew [view email]
[v1] Sat, 2 Oct 2021 02:40:40 UTC (16,307 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Asking questions on handwritten document collections

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Asking questions on handwritten document collections

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators