-
arXiv:2505.20428 [pdf, ps, other]
The UD-NewsCrawl Treebank: Reflections and Challenges from a Large-scale Tagalog Syntactic Annotation Project
Abstract: This paper presents UD-NewsCrawl, the largest Tagalog treebank to date, containing 15.6k trees manually annotated according to the Universal Dependencies framework. We detail our treebank development process, including data collection, pre-processing, manual annotation, and quality assurance procedures. We provide baseline evaluations using multiple transformer-based models to assess the performan… ▽ More
Submitted 26 May, 2025; originally announced May 2025.
Comments: Link to treebank: https://huggingface.co/datasets/UD-Filipino/UD_Tagalog-NewsCrawl ; All authors contributed equally in this work