Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

Barman, Raphaël; Ehrmann, Maud; Clematide, Simon; Oliveira, Sofia Ares; Kaplan, Frédéric

doi:10.46298/jdmdh.6107

Computer Science > Computer Vision and Pattern Recognition

arXiv:2002.06144 (cs)

[Submitted on 14 Feb 2020 (v1), last revised 14 Dec 2020 (this version, v4)]

Title:Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

Authors:Raphaël Barman, Maud Ehrmann, Simon Clematide, Sofia Ares Oliveira, Frédéric Kaplan

View PDF

Abstract:The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. If the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of finer-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce a multimodal approach for the semantic segmentation of historical newspapers that combines visual and textual features. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate, among others, the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to high material variance.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2002.06144 [cs.CV]
	(or arXiv:2002.06144v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2002.06144
Journal reference:	Journal of Data Mining & Digital Humanities, HistoInformatics, HistoInformatics (January 19, 2021) jdmdh:6107
Related DOI:	https://doi.org/10.46298/jdmdh.6107

Submission history

From: Raphaël Barman [view email]
[v1] Fri, 14 Feb 2020 17:56:18 UTC (6,390 KB)
[v2] Mon, 11 May 2020 11:59:18 UTC (6,376 KB)
[v3] Sun, 20 Sep 2020 07:45:29 UTC (6,376 KB)
[v4] Mon, 14 Dec 2020 16:56:29 UTC (6,376 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators