Building an Endangered Language Resource in the Classroom: Universal Dependencies for Kakataibo
Authors:
Roberto Zariquiey,
Claudia Alvarado,
Ximena Echevarria,
Luisa Gomez,
Rosa Gonzales,
Mariana Illescas,
Sabina Oporto,
Frederic Blum,
Arturo Oncevay,
Javier Vera
Abstract:
In this paper, we launch a new Universal Dependencies treebank for an endangered language from Amazonia: Kakataibo, a Panoan language spoken in Peru. We first discuss the collaborative methodology implemented, which proved effective to create a treebank in the context of a Computational Linguistic course for undergraduates. Then, we describe the general details of the treebank and the language-spe…
▽ More
In this paper, we launch a new Universal Dependencies treebank for an endangered language from Amazonia: Kakataibo, a Panoan language spoken in Peru. We first discuss the collaborative methodology implemented, which proved effective to create a treebank in the context of a Computational Linguistic course for undergraduates. Then, we describe the general details of the treebank and the language-specific considerations implemented for the proposed annotation. We finally conduct some experiments on part-of-speech tagging and syntactic dependency parsing. We focus on monolingual and transfer learning settings, where we study the impact of a Shipibo-Konibo treebank, another Panoan language resource.
△ Less
Submitted 21 June, 2022;
originally announced June 2022.