A Transformer-based Math Language Model for Handwritten Math Expression Recognition

Ung, Huy Quang; Nguyen, Cuong Tuan; Nguyen, Hung Tuan; Truong, Thanh-Nghia; Nakagawa, Masaki

Computer Science > Computation and Language

arXiv:2108.05002 (cs)

[Submitted on 11 Aug 2021]

Title:A Transformer-based Math Language Model for Handwritten Math Expression Recognition

Authors:Huy Quang Ung, Cuong Tuan Nguyen, Hung Tuan Nguyen, Thanh-Nghia Truong, Masaki Nakagawa

View PDF

Abstract:Handwritten mathematical expressions (HMEs) contain ambiguities in their interpretations, even for humans sometimes. Several math symbols are very similar in the writing style, such as dot and comma or 0, O, and o, which is a challenge for HME recognition systems to handle without using contextual information. To address this problem, this paper presents a Transformer-based Math Language Model (TMLM). Based on the self-attention mechanism, the high-level representation of an input token in a sequence of tokens is computed by how it is related to the previous tokens. Thus, TMLM can capture long dependencies and correlations among symbols and relations in a mathematical expression (ME). We trained the proposed language model using a corpus of approximately 70,000 LaTeX sequences provided in CROHME 2016. TMLM achieved the perplexity of 4.42, which outperformed the previous math language models, i.e., the N-gram and recurrent neural network-based language models. In addition, we combine TMLM into a stochastic context-free grammar-based HME recognition system using a weighting parameter to re-rank the top-10 best candidates. The expression rates on the testing sets of CROHME 2016 and CROHME 2019 were improved by 2.97 and 0.83 percentage points, respectively.

Comments:	14 pages, accepted in ICDAR-DIL 2021
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2108.05002 [cs.CL]
	(or arXiv:2108.05002v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2108.05002

Submission history

From: Huy Quang Ung [view email]
[v1] Wed, 11 Aug 2021 03:03:48 UTC (406 KB)

Computer Science > Computation and Language

Title:A Transformer-based Math Language Model for Handwritten Math Expression Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Transformer-based Math Language Model for Handwritten Math Expression Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators