Skip to main content

Showing 1–1 of 1 results for author: Kostelník, M

.
  1. arXiv:2503.16664  [pdf, other

    cs.CV

    TextBite: A Historical Czech Document Dataset for Logical Page Segmentation

    Authors: Martin Kostelník, Karel Beneš, Michal Hradiš

    Abstract: Logical page segmentation is an important step in document analysis, enabling better semantic representations, information retrieval, and text understanding. Previous approaches define logical segmentation either through text or geometric objects, relying on OCR or precise geometry. To avoid the need for OCR, we define the task purely as segmentation in the image domain. Furthermore, to ensure the… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.