Skip to main content

Showing 1–23 of 23 results for author: Hradis, M

.
  1. arXiv:2504.00558  [pdf, other

    cs.CV

    Archival Faces: Detection of Faces in Digitized Historical Documents

    Authors: Marek Vaško, Adam Herout, Michal Hradiš

    Abstract: When digitizing historical archives, it is necessary to search for the faces of celebrities and ordinary people, especially in newspapers, link them to the surrounding text, and make them searchable. Existing face detectors on datasets of scanned historical documents fail remarkably -- current detection tools only achieve around $24\%$ mAP at $50:90\%$ IoU. This work compensates for this failure b… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 15 pages, 6 figures, 6 tables

    MSC Class: 68T45 (Primary) 68T10; 68T07 (Secondary) ACM Class: I.4.8; I.5.1

  2. arXiv:2503.22526  [pdf, other

    cs.CV cs.AI cs.LG

    AnnoPage Dataset: Dataset of Non-Textual Elements in Documents with Fine-Grained Categorization

    Authors: Martin Kišš, Michal Hradiš, Martina Dvořáková, Václav Jiroušek, Filip Kersch

    Abstract: We introduce the AnnoPage Dataset, a novel collection of 7550 pages from historical documents, primarily in Czech and German, spanning from 1485 to the present, focusing on the late 19th and early 20th centuries. The dataset is designed to support research in document layout analysis and object detection. Each page is annotated with axis-aligned bounding boxes (AABB) representing elements of 25 ca… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: 15 pages, 2 tables, 6 figures; Submitted to ICDAR25

  3. arXiv:2503.22513  [pdf, other

    cs.CV cs.AI cs.LG

    Masked Self-Supervised Pre-Training for Text Recognition Transformers on Large-Scale Datasets

    Authors: Martin Kišš, Michal Hradiš

    Abstract: Self-supervised learning has emerged as a powerful approach for leveraging large-scale unlabeled data to improve model performance in various domains. In this paper, we explore masked self-supervised pre-training for text recognition transformers. Specifically, we propose two modifications to the pre-training phase: progressively increasing the masking probability, and modifying the loss function… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: 18 pages, 7 tables, 6 figures; Submitted to ICDAR25

  4. arXiv:2503.19658  [pdf, other

    cs.CV cs.AI cs.LG

    BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction

    Authors: Jan Kohút, Martin Dočekal, Michal Hradiš, Marek Vaško

    Abstract: Manual digitization of bibliographic metadata is time consuming and labor intensive, especially for historical and real-world archives with highly variable formatting across documents. Despite advances in machine learning, the absence of dedicated datasets for metadata extraction hinders automation. To address this gap, we introduce BiblioPage, a dataset of scanned title pages annotated with struc… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: Submitted to ICDAR2025 conference

  5. arXiv:2503.19546  [pdf, other

    cs.CV

    Practical Fine-Tuning of Autoregressive Models on Limited Handwritten Texts

    Authors: Jan Kohút, Michal Hradiš

    Abstract: A common use case for OCR applications involves users uploading documents and progressively correcting automatic recognition to obtain the final transcript. This correction phase presents an opportunity for progressive adaptation of the OCR model, making it crucial to adapt early, while ensuring stability and reliability. We demonstrate that state-of-the-art transformer-based models can effectivel… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: Submitted to ICDAR2025 conference

  6. arXiv:2503.16664  [pdf, other

    cs.CV

    TextBite: A Historical Czech Document Dataset for Logical Page Segmentation

    Authors: Martin Kostelník, Karel Beneš, Michal Hradiš

    Abstract: Logical page segmentation is an important step in document analysis, enabling better semantic representations, information retrieval, and text understanding. Previous approaches define logical segmentation either through text or geometric objects, relying on OCR or precise geometry. To avoid the need for OCR, we define the task purely as segmentation in the image domain. Furthermore, to ensure the… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  7. arXiv:2412.17933  [pdf, other

    cs.CL cs.AI

    BenCzechMark : A Czech-centric Multitask and Multimetric Benchmark for Large Language Models with Duel Scoring Mechanism

    Authors: Martin Fajcik, Martin Docekal, Jan Dolezal, Karel Ondrej, Karel Beneš, Jan Kapsa, Pavel Smrz, Alexander Polok, Michal Hradis, Zuzana Neverilova, Ales Horak, Radoslav Sabol, Michal Stefanik, Adam Jirkovsky, David Adamczyk, Petr Hyner, Jan Hula, Hynek Kydlicek

    Abstract: We present BenCzechMark (BCM), the first comprehensive Czech language benchmark designed for large language models, offering diverse tasks, multiple task formats, and multiple evaluation metrics. Its duel scoring system is grounded in statistical significance theory and uses aggregation across tasks inspired by social preference theory. Our benchmark encompasses 50 challenging tasks, with correspo… ▽ More

    Submitted 22 May, 2025; v1 submitted 23 December, 2024; originally announced December 2024.

    Comments: Accepted to TACL

  8. arXiv:2411.12921  [pdf, other

    cs.IR cs.AI

    A Comparative Study of Text Retrieval Models on DaReCzech

    Authors: Jakub Stetina, Martin Fajcik, Michal Stefanik, Michal Hradis

    Abstract: This article presents a comprehensive evaluation of 7 off-the-shelf document retrieval models: Splade, Plaid, Plaid-X, SimCSE, Contriever, OpenAI ADA and Gemma2 chosen to determine their performance on the Czech retrieval dataset DaReCzech. The primary objective of our experiments is to estimate the quality of modern retrieval approaches in the Czech language. Our analyses include retrieval qualit… ▽ More

    Submitted 20 December, 2024; v1 submitted 19 November, 2024; originally announced November 2024.

  9. arXiv:2405.00420  [pdf, other

    cs.CV cs.AI cs.LG

    Self-supervised Pre-training of Text Recognizers

    Authors: Martin Kišš, Michal Hradiš

    Abstract: In this paper, we investigate self-supervised pre-training methods for document text recognition. Nowadays, large unlabeled datasets can be collected for many research tasks, including text recognition, but it is costly to annotate them. Therefore, methods utilizing unlabeled data are researched. We study self-supervised pre-training methods based on masked label prediction using three different a… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 18 pages, 6 figures, 4 tables, accepted to ICDAR24

  10. arXiv:2302.06318  [pdf, other

    cs.CV

    Towards Writing Style Adaptation in Handwriting Recognition

    Authors: Jan Kohút, Michal Hradiš, Martin Kišš

    Abstract: One of the challenges of handwriting recognition is to transcribe a large number of vastly different writing styles. State-of-the-art approaches do not explicitly use information about the writer's style, which may be limiting overall accuracy due to various ambiguities. We explore models with writer-dependent parameters which take the writer's identity as an additional input. The proposed models… ▽ More

    Submitted 30 April, 2025; v1 submitted 13 February, 2023; originally announced February 2023.

  11. arXiv:2302.06308  [pdf, other

    cs.CV

    Fine-tuning Is a Surprisingly Effective Domain Adaptation Baseline in Handwriting Recognition

    Authors: Jan Kohút, Michal Hradiš

    Abstract: In many machine learning tasks, a large general dataset and a small specialized dataset are available. In such situations, various domain adaptation methods can be used to adapt a general model to the target dataset. We show that in the case of neural networks trained for handwriting recognition using CTC, simple fine-tuning with data augmentation works surprisingly well in such scenarios and that… ▽ More

    Submitted 30 April, 2025; v1 submitted 13 February, 2023; originally announced February 2023.

  12. arXiv:2212.02135  [pdf, other

    cs.LG cs.CV

    SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft Pseudo-Labels

    Authors: Martin Kišš, Michal Hradiš, Karel Beneš, Petr Buchal, Michal Kula

    Abstract: This paper explores semi-supervised training for sequence tasks, such as Optical Character Recognition or Automatic Speech Recognition. We propose a novel loss function $\unicode{x2013}$ SoftCTC $\unicode{x2013}$ which is an extension of CTC allowing to consider multiple transcription variants at the same time. This allows to omit the confidence based filtering step which is otherwise a crucial co… ▽ More

    Submitted 19 September, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

    Comments: 21 pages, 8 figures, 6 tables, accepted to International Journal on Document Analysis and Recognition (IJDAR)

    MSC Class: 68T07; 68T10

  13. arXiv:2201.09575  [pdf, other

    cs.CV

    Importance of Textlines in Historical Document Classification

    Authors: Martin Kišš, Jan Kohút, Karel Beneš, Michal Hradiš

    Abstract: This paper describes a system prepared at Brno University of Technology for ICDAR 2021 Competition on Historical Document Classification, experiments leading to its design, and the main findings. The solved tasks include script and font classification, document origin localization, and dating. We combined patch-level and line-level approaches, where the line-level system utilizes an existing, publ… ▽ More

    Submitted 30 March, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

    Comments: 13 pages, 7 figures, 5 tables

    MSC Class: 68T07; 68T10

  14. AT-ST: Self-Training Adaptation Strategy for OCR in Domains with Limited Transcriptions

    Authors: Martin Kišš, Karel Beneš, Michal Hradiš

    Abstract: This paper addresses text recognition for domains with limited manual annotations by a simple self-training strategy. Our approach should reduce human annotation effort when target domain data is plentiful, such as when transcribing a collection of single person's correspondence or a large manuscript. We propose to train a seed system on large scale data from related domains mixed with available a… ▽ More

    Submitted 27 April, 2021; originally announced April 2021.

    Comments: 15 pages, 6 figures, 5 tables

  15. TS-Net: OCR Trained to Switch Between Text Transcription Styles

    Authors: Jan Kohút, Michal Hradiš

    Abstract: Users of OCR systems, from different institutions and scientific disciplines, prefer and produce different transcription styles. This presents a problem for training of consistent text recognition neural networks on real-world data. We propose to extend existing text recognition networks with a Transcription Style Block (TSB) which can learn from data to switch between multiple transcription style… ▽ More

    Submitted 13 February, 2023; v1 submitted 9 March, 2021; originally announced March 2021.

    Journal ref: ICDAR 2021: Proceedings, Part IV 16 (pp. 478-493)

  16. arXiv:2102.11838  [pdf, other

    cs.CV

    Page Layout Analysis System for Unconstrained Historic Documents

    Authors: Oldřich Kodym, Michal Hradiš

    Abstract: Extraction of text regions and individual text lines from historic documents is necessary for automatic transcription. We propose extending a CNN-based text baseline detection system by adding line height and text block boundary predictions to the model output, allowing the system to extract more comprehensive layout information. We also show that pixel-wise text orientation prediction can be used… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

    Comments: Submitted to ICDAR2021 conference

  17. arXiv:1907.01307  [pdf, other

    cs.CV

    Brno Mobile OCR Dataset

    Authors: Martin Kišš, Michal Hradiš, Oldřich Kodym

    Abstract: We introduce the Brno Mobile OCR Dataset (B-MOD) for document Optical Character Recognition from low-quality images captured by handheld mobile devices. While OCR of high-quality scanned documents is a mature field where many commercial tools are available, and large datasets of text in the wild exist, no existing datasets can be used to develop and test document OCR methods robust to non-uniform… ▽ More

    Submitted 2 July, 2019; originally announced July 2019.

  18. arXiv:1712.06352  [pdf, other

    cs.RO

    CNN for IMU Assisted Odometry Estimation using Velodyne LiDAR

    Authors: Martin Velas, Michal Spanel, Michal Hradis, Adam Herout

    Abstract: We introduce a novel method for odometry estimation using convolutional neural networks from 3D LiDAR scans. The original sparse data are encoded into 2D matrices for the training of proposed networks and for the prediction. Our networks show significantly better precision in the estimation of translational motion parameters comparing with state of the art method LOAM, while achieving real-time pe… ▽ More

    Submitted 18 December, 2017; originally announced December 2017.

  19. arXiv:1709.02128  [pdf, other

    cs.RO

    CNN for Very Fast Ground Segmentation in Velodyne LiDAR Data

    Authors: Martin Velas, Michal Spanel, Michal Hradis, Adam Herout

    Abstract: This paper presents a novel method for ground segmentation in Velodyne point clouds. We propose an encoding of sparse 3D data from the Velodyne sensor suitable for training a convolutional neural network (CNN). This general purpose approach is used for segmentation of the sparse point cloud into ground and non-ground points. The LiDAR data are represented as a multi-channel 2D signal where the hor… ▽ More

    Submitted 7 September, 2017; originally announced September 2017.

    Comments: ICRA 2018 submission

  20. Camera Elevation Estimation from a Single Mountain Landscape Photograph

    Authors: Martin Cadik, Jan Vasicek, Michal Hradis, Filip Radenovic, Ondrej Chum

    Abstract: This work addresses the problem of camera elevation estimation from a single photograph in an outdoor environment. We introduce a new benchmark dataset of one-hundred thousand images with annotated camera elevation called Alps100K. We propose and experimentally evaluate two automatic data-driven approaches to camera elevation estimation: one based on convolutional neural networks, the other on loc… ▽ More

    Submitted 12 July, 2016; originally announced July 2016.

    Journal ref: In Xianghua Xie, Mark W. Jones, and Gary K. L. Tam, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 30.1-30.12. BMVA Press, September 2015

  21. arXiv:1605.00366  [pdf, other

    cs.CV

    Compression Artifacts Removal Using Convolutional Neural Networks

    Authors: Pavel Svoboda, Michal Hradis, David Barina, Pavel Zemcik

    Abstract: This paper shows that it is possible to train large and deep convolutional neural networks (CNN) for JPEG compression artifacts reduction, and that such networks can provide significantly better reconstruction quality compared to previously used smaller networks as well as to any other state-of-the-art methods. We were able to train networks with 8 layers in a single step and in relatively short t… ▽ More

    Submitted 2 May, 2016; originally announced May 2016.

    Comments: To be published in WSCG 2016

  22. arXiv:1602.07873  [pdf, other

    cs.CV

    CNN for License Plate Motion Deblurring

    Authors: Pavel Svoboda, Michal Hradis, Lukas Marsik, Pavel Zemcik

    Abstract: In this work we explore the previously proposed approach of direct blind deconvolution and denoising with convolutional neural networks in a situation where the blur kernels are partially constrained. We focus on blurred images from a real-life traffic surveillance system, on which we, for the first time, demonstrate that neural networks trained on artificial data provide superior reconstruction q… ▽ More

    Submitted 25 February, 2016; originally announced February 2016.

  23. arXiv:1506.03995  [pdf, other

    cs.CV

    Technical Report: Image Captioning with Semantically Similar Images

    Authors: Martin Kolář, Michal Hradiš, Pavel Zemčík

    Abstract: This report presents our submission to the MS COCO Captioning Challenge 2015. The method uses Convolutional Neural Network activations as an embedding to find semantically similar images. From these images, the most typical caption is selected based on unigram frequencies. Although the method received low scores with automated evaluation metrics and in human assessed average correctness, it is com… ▽ More

    Submitted 12 June, 2015; originally announced June 2015.

    Comments: 3 pages