Word and character segmentation directly in run-length compressed handwritten document images

R, Amarnath; Nagabhushan, P.; Javed, Mohammed

Computer Science > Computer Vision and Pattern Recognition

arXiv:1909.05146 (cs)

[Submitted on 18 Aug 2019]

Title:Word and character segmentation directly in run-length compressed handwritten document images

Authors:Amarnath R, P. Nagabhushan, Mohammed Javed

View PDF

Abstract:From the literature, it is demonstrated that performing text-line segmentation directly in the run-length compressed handwritten document images significantly reduces the computational time and memory space. In this paper, we investigate the issues of word and character segmentation directly on the run-length compressed document images. Primarily, the spreads of the characters are intelligently extracted from the foreground runs of the compressed data and subsequently connected components are established. The spacing between the connected components would be larger between the adjacent words when compared to that of intra-words. With this knowledge, a threshold is empirically chosen for inter-word separation. Every connected component within a word is further analysed for character segmentation. Here, min-cut graph concept is used for separating the touching characters. Over-segmentation and under-segmentation issues are addressed by insertion and deletion operations respectively. The approach has been developed particularly for compressed handwritten English document images. However, the model has been tested on non-English document images.

Comments:	17 pages,19 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1909.05146 [cs.CV]
	(or arXiv:1909.05146v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1909.05146

Submission history

From: Amarnath R [view email]
[v1] Sun, 18 Aug 2019 09:48:52 UTC (1,482 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Word and character segmentation directly in run-length compressed handwritten document images

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Word and character segmentation directly in run-length compressed handwritten document images

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators