Search | arXiv e-print repository

Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes

Authors: Christoph Wick, Jochen Zöllner, Tobias Grüning

Abstract: In contrast to Connectionist Temporal Classification (CTC) approaches, Sequence-To-Sequence (S2S) models for Handwritten Text Recognition (HTR) suffer from errors such as skipped or repeated words which often occur at the end of a sequence. In this paper, to combine the best of both approaches, we propose to use the CTC-Prefix-Score during S2S decoding. Hereby, during beam search, paths that are i… ▽ More In contrast to Connectionist Temporal Classification (CTC) approaches, Sequence-To-Sequence (S2S) models for Handwritten Text Recognition (HTR) suffer from errors such as skipped or repeated words which often occur at the end of a sequence. In this paper, to combine the best of both approaches, we propose to use the CTC-Prefix-Score during S2S decoding. Hereby, during beam search, paths that are invalid according to the CTC confidence matrix are penalised. Our network architecture is composed of a Convolutional Neural Network (CNN) as visual backbone, bidirectional Long-Short-Term-Memory-Cells (LSTMs) as encoder, and a decoder which is a Transformer with inserted mutual attention layers. The CTC confidences are computed on the encoder while the Transformer is only used for character-wise S2S decoding. We evaluate this setup on three HTR data sets: IAM, Rimes, and StAZH. On IAM, we achieve a competitive Character Error Rate (CER) of 2.95% when pretraining our model on synthetic data and including a character-based language model for contemporary English. Compared to other state-of-the-art approaches, our model requires about 10-20 times less parameters. Access our shared implementations via this link to GitHub: https://github.com/Planet-AI-GmbH/tfaip-hybrid-ctc-s2s. △ Less

Submitted 29 March, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

Comments: 15 pages, 6 tables, 3 figures

arXiv:2106.07881 [pdf]

Mixed Model OCR Training on Historical Latin Script for Out-of-the-Box Recognition and Finetuning

Authors: Christian Reul, Christoph Wick, Maximilian Nöth, Andreas Büttner, Maximilian Wehner, Uwe Springmann

Abstract: In order to apply Optical Character Recognition (OCR) to historical printings of Latin script fully automatically, we report on our efforts to construct a widely-applicable polyfont recognition model yielding text with a Character Error Rate (CER) around 2% when applied out-of-the-box. Moreover, we show how this model can be further finetuned to specific classes of printings with little manual and… ▽ More In order to apply Optical Character Recognition (OCR) to historical printings of Latin script fully automatically, we report on our efforts to construct a widely-applicable polyfont recognition model yielding text with a Character Error Rate (CER) around 2% when applied out-of-the-box. Moreover, we show how this model can be further finetuned to specific classes of printings with little manual and computational effort. The mixed or polyfont model is trained on a wide variety of materials, in terms of age (from the 15th to the 19th century), typography (various types of Fraktur and Antiqua), and languages (among others, German, Latin, and French). To optimize the results we combined established techniques of OCR training like pretraining, data augmentation, and voting. In addition, we used various preprocessing methods to enrich the training data and obtain more robust models. We also implemented a two-stage approach which first trains on all available, considerably unbalanced data and then refines the output by training on a selected more balanced subset. Evaluations on 29 previously unseen books resulted in a CER of 1.73%, outperforming a widely used standard model with a CER of 2.84% by almost 40%. Training a more specialized model for some unseen Early Modern Latin books starting from our mixed model led to a CER of 1.47%, an improvement of up to 50% compared to training from scratch and up to 30% compared to training from the aforementioned standard model. Our new mixed model is made openly available to the community. △ Less

Submitted 15 June, 2021; originally announced June 2021.

Comments: submitted to HIP'21

arXiv:2104.11559 [pdf, other]

doi 10.3390/info12110443

Optimizing small BERTs trained for German NER

Authors: Jochen Zöllner, Konrad Sperfeld, Christoph Wick, Roger Labahn

Abstract: Currently, the most widespread neural network architecture for training language models is the so called BERT which led to improvements in various Natural Language Processing (NLP) tasks. In general, the larger the number of parameters in a BERT model, the better the results obtained in these NLP tasks. Unfortunately, the memory consumption and the training duration drastically increases with the… ▽ More Currently, the most widespread neural network architecture for training language models is the so called BERT which led to improvements in various Natural Language Processing (NLP) tasks. In general, the larger the number of parameters in a BERT model, the better the results obtained in these NLP tasks. Unfortunately, the memory consumption and the training duration drastically increases with the size of these models. In this article, we investigate various training techniques of smaller BERT models: We combine different methods from other BERT variants like ALBERT, RoBERTa, and relative positional encoding. In addition, we propose two new fine-tuning modifications leading to better performance: Class-Start-End tagging and a modified form of Linear Chain Conditional Random Fields. Furthermore, we introduce Whole-Word Attention which reduces BERTs memory usage and leads to a small increase in performance compared to classical Multi-Head-Attention. We evaluate these techniques on five public German Named Entity Recognition (NER) tasks of which two are introduced by this article. △ Less

Submitted 1 November, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

Journal ref: MDPI Information 2021, vol. 12 nr. 11, article-nr. 443

arXiv:1909.04032 [pdf, other]

doi 10.3390/app9224853

OCR4all -- An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings

Authors: Christian Reul, Dennis Christ, Alexander Hartelt, Nico Balbach, Maximilian Wehner, Uwe Springmann, Christoph Wick, Christine Grundig, Andreas Büttner, Frank Puppe

Abstract: Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the complexity of the layout and the highly variant typography. Nevertheless, in the last few years great progress has been made in the area of historical OCR, resulting in several powerful open-source tools for preprocessing, layout recognition and segmentation, character recognition and post-processin… ▽ More Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the complexity of the layout and the highly variant typography. Nevertheless, in the last few years great progress has been made in the area of historical OCR, resulting in several powerful open-source tools for preprocessing, layout recognition and segmentation, character recognition and post-processing. The drawback of these tools often is their limited applicability by non-technical users like humanist scholars and in particular the combined use of several tools in a workflow. In this paper we present an open-source OCR software called OCR4all, which combines state-of-the-art OCR components and continuous model training into a comprehensive workflow. A comfortable GUI allows error corrections not only in the final output, but already in early stages to minimize error propagations. Further on, extensive configuration capabilities are provided to set the degree of automation of the workflow and to make adaptations to the carefully selected default parameters for specific printings, if necessary. Experiments showed that users with minimal or no experience were able to capture the text of even the earliest printed books with manageable effort and great quality, achieving excellent character error rates (CERs) below 0.5%. The fully automated application on 19th century novels showed that OCR4all can considerably outperform the commercial state-of-the-art tool ABBYY Finereader on moderate layouts if suitably pretrained mixed OCR models are available. The architecture of OCR4all allows the easy integration (or substitution) of newly developed tools for its main components by standardized interfaces like PageXML, thus aiming at continual higher automation for historical printings. △ Less

Submitted 9 September, 2019; originally announced September 2019.

Comments: submitted to MDPI - Applied Sciences

Journal ref: https://www.mdpi.com/2076-3417/9/22/4853/htm

arXiv:1905.06009 [pdf]

Structural Characterization of an Ionic Liquid in bulk and in nano-confined environment from MD simulations

Authors: Natasa Vucemilovic-Alagic, Radha D. Banhatti, Robert Stepic, Christian R. Wick, Daniel Berger, Mario Gaimann, Andreas Bear, Jens Harting, David M. Smith, Ana-Suncana Smith

Abstract: This article contains data on structural characterization of the [C2Mim][NTf2] in bulk and in nano-confined environment obtained using MD simulations. These data supplement those presented in the paper Insights from Molecular Dynamics Simulations on Structural Organization and Diffusive Dynamics of an Ionic Liquid at Solid and Vacuum Interfaces, where force fields with three different charge metho… ▽ More This article contains data on structural characterization of the [C2Mim][NTf2] in bulk and in nano-confined environment obtained using MD simulations. These data supplement those presented in the paper Insights from Molecular Dynamics Simulations on Structural Organization and Diffusive Dynamics of an Ionic Liquid at Solid and Vacuum Interfaces, where force fields with three different charge methods and three charge scaling factors were used for the analysis of the IL in the bulk, at the interface with the vacuum and the IL film in the contact with a hydroxylated alumina surface. Here, we present details on the construction of the model systems in an extended detailed methods section. Furthermore, for best parametrization, structural and dynamic properties of IL in different environment are studied with certain features presented herein. △ Less

Submitted 15 May, 2019; originally announced May 2019.

Comments: 13 pages, 12 figures. arXiv admin note: substantial text overlap with arXiv:1903.09450

arXiv:1903.09450 [pdf]

doi 10.1016/j.jcis.2019.06.017

Insights from Molecular Dynamics Simulations on Structural Organization and Diffusive Dynamics of an Ionic Liquid at Solid and Vacuum Interfaces

Authors: Natasa Vucemilovic-Alagic, Radha D. Banhatti, Robert Stepic, Christian R. Wick, Daniel Berger, Mario U. Gaimann, Andreas Baer, Jens Harting, David M. Smith, Ana-Suncana Smith

Abstract: Hypothesis A prototypical modelling approach is required for a full characterisation of the static and equilibrium dynamical properties of confined ionic liquids (ILs), in order to gain predictive power of properties that are difficult to extract from experiments. Such a protocol needs to be constructed by benchmarking molecular dynamics simulations against available experiments. Simulations We… ▽ More Hypothesis A prototypical modelling approach is required for a full characterisation of the static and equilibrium dynamical properties of confined ionic liquids (ILs), in order to gain predictive power of properties that are difficult to extract from experiments. Such a protocol needs to be constructed by benchmarking molecular dynamics simulations against available experiments. Simulations We perform an in-depth study of [C2Mim][NTf2] in bulk, at the vacuum and at hydroxylated alumina surface. Using the charge methods CHelpG, RESP-HF and RESP-B3LYP with charge scaling factors 1.0, 0.9 and 0.85, we search for an optimum non-polarizable force field by benchmarking against self-diffusion coefficients, surface tension, X-ray reflectivity data, and structural data. Findings Benchmarking, which relies on establishing the significance of an appropriate size of the model systems and the length of the simulations, yields RESP-HF/0.9 as the best suited force field for this IL overall. A complete and accurate characterisation of the spatially-dependent internal configurational space and orientation of IL molecules relative to the solid and vacuum interfaces is obtained. Furthermore, the density and mobility of IL ions in the plane parallel and normal to the interfaces is evaluated and the correlation between the stratification and dynamics in the interfacial layers is detectable deep into the films. △ Less

Submitted 22 March, 2019; originally announced March 2019.

Comments: 14 pages, 9 figures in main text and 14 figures in Supporting Information

Journal ref: Journal of Colloid and Interface Science 553, 350-363 (2019)

arXiv:1810.03436 [pdf]

State of the Art Optical Character Recognition of 19th Century Fraktur Scripts using Open Source Engines

Authors: Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe

Abstract: In this paper we evaluate Optical Character Recognition (OCR) of 19th century Fraktur scripts without book-specific training using mixed models, i.e. models trained to recognize a variety of fonts and typesets from previously unseen sources. We describe the training process leading to strong mixed OCR models and compare them to freely available models of the popular open source engines OCRopus and… ▽ More In this paper we evaluate Optical Character Recognition (OCR) of 19th century Fraktur scripts without book-specific training using mixed models, i.e. models trained to recognize a variety of fonts and typesets from previously unseen sources. We describe the training process leading to strong mixed OCR models and compare them to freely available models of the popular open source engines OCRopus and Tesseract as well as the commercial state of the art system ABBYY. For evaluation, we use a varied collection of unseen data from books, journals, and a dictionary from the 19th century. The experiments show that training mixed models with real data is superior to training with synthetic data and that the novel OCR engine Calamari outperforms the other engines considerably, on average reducing ABBYYs character error rate (CER) by over 70%, resulting in an average CER below 1%. △ Less

Submitted 8 October, 2018; originally announced October 2018.

Comments: Submitted to DHd 2019 (https://dhd2019.org/) which demands a... creative... submission format. Consequently, some captions might look weird and some links aren't clickable. Extended version with more technical details and some fixes to follow

arXiv:1807.02004 [pdf]

Calamari - A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition

Authors: Christoph Wick, Christian Reul, Frank Puppe

Abstract: Optical Character Recognition (OCR) on contemporary and historical data is still in the focus of many researchers. Especially historical prints require book specific trained OCR models to achieve applicable results (Springmann and Lüdeling, 2016, Reul et al., 2017a). To reduce the human effort for manually annotating ground truth (GT) various techniques such as voting and pretraining have shown to… ▽ More Optical Character Recognition (OCR) on contemporary and historical data is still in the focus of many researchers. Especially historical prints require book specific trained OCR models to achieve applicable results (Springmann and Lüdeling, 2016, Reul et al., 2017a). To reduce the human effort for manually annotating ground truth (GT) various techniques such as voting and pretraining have shown to be very efficient (Reul et al., 2018a, Reul et al., 2018b). Calamari is a new open source OCR line recognition software that both uses state-of-the art Deep Neural Networks (DNNs) implemented in Tensorflow and giving native support for techniques such as pretraining and voting. The customizable network architectures constructed of Convolutional Neural Networks (CNNS) and Long-ShortTerm-Memory (LSTM) layers are trained by the so-called Connectionist Temporal Classification (CTC) algorithm of Graves et al. (2006). Optional usage of a GPU drastically reduces the computation times for both training and prediction. We use two different datasets to compare the performance of Calamari to OCRopy, OCRopus3, and Tesseract 4. Calamari reaches a Character Error Rate (CER) of 0.11% on the UW3 dataset written in modern English and 0.18% on the DTA19 dataset written in German Fraktur, which considerably outperforms the results of the existing softwares. △ Less

Submitted 6 August, 2018; v1 submitted 5 July, 2018; originally announced July 2018.

Comments: 11 pages, 3 figures

Journal ref: Digital Humanities Quarterly 14 (2), 2020

arXiv:1802.10038 [pdf, other]

Improving OCR Accuracy on Early Printed Books by combining Pretraining, Voting, and Active Learning

Authors: Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe

Abstract: We combine three methods which significantly improve the OCR accuracy of OCR models trained on early printed books: (1) The pretraining method utilizes the information stored in already existing models trained on a variety of typesets (mixed models) instead of starting the training from scratch. (2) Performing cross fold training on a single set of ground truth data (line images and their transcri… ▽ More We combine three methods which significantly improve the OCR accuracy of OCR models trained on early printed books: (1) The pretraining method utilizes the information stored in already existing models trained on a variety of typesets (mixed models) instead of starting the training from scratch. (2) Performing cross fold training on a single set of ground truth data (line images and their transcriptions) with a single OCR engine (OCRopus) produces a committee whose members then vote for the best outcome by also taking the top-N alternatives and their intrinsic confidence values into account. (3) Following the principle of maximal disagreement we select additional training lines which the voters disagree most on, expecting them to offer the highest information gain for a subsequent training (active learning). Evaluations on six early printed books yielded the following results: On average the combination of pretraining and voting improved the character accuracy by 46% when training five folds starting from the same mixed model. This number rose to 53% when using different models for pretraining, underlining the importance of diverse voters. Incorporating active learning improved the obtained results by another 16% on average (evaluated on three of the six books). Overall, the proposed methods lead to an average error rate of 2.5% when training on only 60 lines. Using a substantial ground truth pool of 1,000 lines brought the error rate down even further to less than 1% on average. △ Less

Submitted 28 February, 2018; v1 submitted 27 February, 2018; originally announced February 2018.

Comments: Submitted to JLCL Volume 33 (2018), Issue 1: Special Issue on Automatic Text and Layout Recognition

arXiv:1802.10033 [pdf, other]

Improving OCR Accuracy on Early Printed Books using Deep Convolutional Networks

Authors: Christoph Wick, Christian Reul, Frank Puppe

Abstract: This paper proposes a combination of a convolutional and a LSTM network to improve the accuracy of OCR on early printed books. While the standard model of line based OCR uses a single LSTM layer, we utilize a CNN- and Pooling-Layer combination in advance of an LSTM layer. Due to the higher amount of trainable parameters the performance of the network relies on a high amount of training examples to… ▽ More This paper proposes a combination of a convolutional and a LSTM network to improve the accuracy of OCR on early printed books. While the standard model of line based OCR uses a single LSTM layer, we utilize a CNN- and Pooling-Layer combination in advance of an LSTM layer. Due to the higher amount of trainable parameters the performance of the network relies on a high amount of training examples to unleash its power. Hereby, the error is reduced by a factor of up to 44%, yielding a CER of 1% and below. To further improve the results we use a voting mechanism to achieve character error rates (CER) below $0.5%$. The runtime of the deep model for training and prediction of a book behaves very similar to a shallow network. △ Less

Submitted 27 February, 2018; originally announced February 2018.

Comments: 16 pages, 4 figures, 8 tables, submitted to JLCL Volume 33 (2018), Issue 1

arXiv:1712.05586 [pdf]

Transfer Learning for OCRopus Model Training on Early Printed Books

Authors: Christian Reul, Christoph Wick, Uwe Springmann, Frank Puppe

Abstract: A method is presented that significantly reduces the character error rates for OCR text obtained from OCRopus models trained on early printed books when only small amounts of diplomatic transcriptions are available. This is achieved by building from already existing models during training instead of starting from scratch. To overcome the discrepancies between the set of characters of the pretraine… ▽ More A method is presented that significantly reduces the character error rates for OCR text obtained from OCRopus models trained on early printed books when only small amounts of diplomatic transcriptions are available. This is achieved by building from already existing models during training instead of starting from scratch. To overcome the discrepancies between the set of characters of the pretrained model and the additional ground truth the OCRopus code is adapted to allow for alphabet expansion or reduction. The character set is now capable of flexibly adding and deleting characters from the pretrained alphabet when an existing model is loaded. For our experiments we use a self-trained mixed model on early Latin prints and the two standard OCRopus models on modern English and German Fraktur texts. The evaluation on seven early printed books showed that training from the Latin mixed model reduces the average amount of errors by 43% and 26%, respectively compared to training from scratch with 60 and 150 lines of ground truth, respectively. Furthermore, it is shown that even building from mixed models trained on data unrelated to the newly added training and test data can lead to significantly improved recognition results. △ Less

Submitted 21 December, 2017; v1 submitted 15 December, 2017; originally announced December 2017.

arXiv:1712.00967 [pdf, other]

Leaf Identification Using a Deep Convolutional Neural Network

Authors: Christoph Wick, Frank Puppe

Abstract: Convolutional neural networks (CNNs) have become popular especially in computer vision in the last few years because they achieved outstanding performance on different tasks, such as image classifications. We propose a nine-layer CNN for leaf identification using the famous Flavia and Foliage datasets. Usually the supervised learning of deep CNNs requires huge datasets for training. However, the u… ▽ More Convolutional neural networks (CNNs) have become popular especially in computer vision in the last few years because they achieved outstanding performance on different tasks, such as image classifications. We propose a nine-layer CNN for leaf identification using the famous Flavia and Foliage datasets. Usually the supervised learning of deep CNNs requires huge datasets for training. However, the used datasets contain only a few examples per plant species. Therefore, we apply data augmentation and transfer learning to prevent our network from overfitting. The trained CNNs achieve recognition rates above 99% on the Flavia and Foliage datasets, and slightly outperform current methods for leaf classification. △ Less

Submitted 4 December, 2017; originally announced December 2017.

arXiv:1711.09670 [pdf, other]

doi 10.1109/DAS.2018.30

Improving OCR Accuracy on Early Printed Books by utilizing Cross Fold Training and Voting

Authors: Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe

Abstract: In this paper we introduce a method that significantly reduces the character error rates for OCR text obtained from OCRopus models trained on early printed books. The method uses a combination of cross fold training and confidence based voting. After allocating the available ground truth in different subsets several training processes are performed, each resulting in a specific OCR model. The OCR… ▽ More In this paper we introduce a method that significantly reduces the character error rates for OCR text obtained from OCRopus models trained on early printed books. The method uses a combination of cross fold training and confidence based voting. After allocating the available ground truth in different subsets several training processes are performed, each resulting in a specific OCR model. The OCR text generated by these models then gets voted to determine the final output by taking the recognized characters, their alternatives, and the confidence values assigned to each character into consideration. Experiments on seven early printed books show that the proposed method outperforms the standard approach considerably by reducing the amount of errors by up to 50% and more. △ Less

Submitted 27 November, 2017; originally announced November 2017.

arXiv:1711.07695 [pdf, other]

doi 10.1109/DAS.2018.39

Fully Convolutional Neural Networks for Page Segmentation of Historical Document Images

Authors: Christoph Wick, Frank Puppe

Abstract: We propose a high-performance fully convolutional neural network (FCN) for historical document segmentation that is designed to process a single page in one step. The advantage of this model beside its speed is its ability to directly learn from raw pixels instead of using preprocessing steps e. g. feature computation or superpixel generation. We show that this network yields better results than e… ▽ More We propose a high-performance fully convolutional neural network (FCN) for historical document segmentation that is designed to process a single page in one step. The advantage of this model beside its speed is its ability to directly learn from raw pixels instead of using preprocessing steps e. g. feature computation or superpixel generation. We show that this network yields better results than existing methods on different public data sets. For evaluation of this model we introduce a novel metric that is independent of ambiguous ground truth called Foreground Pixel Accuracy (FgPA). This pixel based measure only counts foreground pixels in the binarized page, any background pixel is omitted. The major advantage of this metric is, that it enables researchers to compare different segmentation methods on their ability to successfully segment text or pictures and not on their ability to learn and possibly overfit the peculiarities of an ambiguous hand-made ground truth segmentation. △ Less

Submitted 15 February, 2018; v1 submitted 21 November, 2017; originally announced November 2017.

Comments: 6 pages, 7 figures, conference

arXiv:1706.04338 [pdf, other]

Playing Music in Just Intonation - A Dynamically Adapting Tuning Scheme

Authors: Karolin Stange, Christoph Wick, Haye Hinrichsen

Abstract: We investigate a dynamically adapting tuning scheme for microtonal tuning of musical instruments, allowing the performer to play music in just intonation in any key. Unlike other methods, which are based on a procedural analysis of the chordal structure, the tuning scheme continually solves a system of linear equations without making explicit decisions. In complex situations, where not all interva… ▽ More We investigate a dynamically adapting tuning scheme for microtonal tuning of musical instruments, allowing the performer to play music in just intonation in any key. Unlike other methods, which are based on a procedural analysis of the chordal structure, the tuning scheme continually solves a system of linear equations without making explicit decisions. In complex situations, where not all intervals of a chord can be tuned according to just frequency ratios, the method automatically yields a tempered compromise. We outline the implementation of the algorithm in an open-source software project that we have provided in order to demonstrate the feasibility of the tuning method. △ Less

Submitted 11 June, 2018; v1 submitted 14 June, 2017; originally announced June 2017.

Comments: 22 pages, 7 figures

arXiv:1508.01652 [pdf, other]

doi 10.1088/1751-8113/49/2/025303

Entanglement formation under random interactions

Authors: Christoph Wick, Jaegon Um, Haye Hinrichsen

Abstract: The temporal evolution of the entanglement between two qubits evolving by random interactions is studied analytically and numerically. Two different types of randomness are investigated. Firstly we analyze an ensemble of systems with randomly chosen but time-independent interaction Hamiltonians. Secondly we consider the case of a temporally fluctuating Hamiltonian, where the unitary evolution can… ▽ More The temporal evolution of the entanglement between two qubits evolving by random interactions is studied analytically and numerically. Two different types of randomness are investigated. Firstly we analyze an ensemble of systems with randomly chosen but time-independent interaction Hamiltonians. Secondly we consider the case of a temporally fluctuating Hamiltonian, where the unitary evolution can be understood as a random walk on the SU (4) group manifold. As a by-product we compute the metric tensor and its inverse as well as the Laplace-Beltrami for SU (4). △ Less

Submitted 19 October, 2015; v1 submitted 7 August, 2015; originally announced August 2015.

Comments: Latex, 24 pages, 4 figures, Supplement material in source archive

Showing 1–16 of 16 results for author: Wick, C