Skip to main content

Showing 1–1 of 1 results for author: Scheithauer, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2411.10068  [pdf, other

    cs.CV

    Diachronic Document Dataset for Semantic Layout Analysis

    Authors: Thibault Clérice, Juliette Janes, Hugo Scheithauer, Sarah Bénière, Florian Cafiero, Laurent Romary, Simon Gabay, Benoît Sagot

    Abstract: We present a novel, open-access dataset designed for semantic layout analysis, built to support document recreation workflows through mapping with the Text Encoding Initiative (TEI) standard. This dataset includes 7,254 annotated pages spanning a large temporal range (1600-2024) of digitised and born-digital materials across diverse document types (magazines, papers from sciences and humanities, P… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.