A Hybrid Approach and Unified Framework for Bibliographic Reference Extraction

Rizvi, Syed Tahseen Raza; Dengel, Andreas; Ahmed, Sheraz

doi:10.1109/ACCESS.2020.3042455

Abstract:Publications are an integral part in a scientific community. Bibliographic reference extraction from scientific publication is a challenging task due to diversity in referencing styles and document layout. Existing methods perform sufficiently on one dataset however, applying these solutions to a different dataset proves to be challenging. Therefore, a generic solution was anticipated which could overcome the limitations of the previous approaches. The contribution of this paper is three-fold. First, it presents a novel approach called DeepBiRD which is inspired by human visual perception and exploits layout features to identify individual references in a scientific publication. Second, we release a large dataset for image-based reference detection with 2401 scans containing 38863 references, all manually annotated for individual reference. Third, we present a unified and highly configurable end-to-end automatic bibliographic reference extraction framework called BRExSys which employs DeepBiRD along with state-of-the-art text-based models to detect and visualize references from a bibliographic document. Our proposed approach pre-processes the images in which a hybrid representation is obtained by processing the given image using different computer vision techniques. Then, it performs layout driven reference detection using Mask R-CNN on a given scientific publication. DeepBiRD was evaluated on two different datasets to demonstrate the generalization of this approach. The proposed system achieved an AP50 of 98.56% on our dataset. DeepBiRD significantly outperformed the current state-of-the-art approach on their dataset. Therefore, suggesting that DeepBiRD is significantly superior in performance, generalized, and independent of any domain or referencing style.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Digital Libraries (cs.DL)
Cite as:	arXiv:1912.07266 [cs.CV]
	(or arXiv:1912.07266v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1912.07266
Related DOI:	https://doi.org/10.1109/ACCESS.2020.3042455

Computer Science > Computer Vision and Pattern Recognition

Title:A Hybrid Approach and Unified Framework for Bibliographic Reference Extraction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators