-
Evaluation of the Accuracy of the BGLemmatizer
Authors:
Elena Karashtranova,
Grigor Iliev,
Nadezhda Borisova,
Yana Chankova,
Irena Atanasova
Abstract:
This paper reveals the results of an analysis of the accuracy of developed software for automatic lemmatization for the Bulgarian language. This lemmatization software is written entirely in Java and is distributed as a GATE plugin. Certain statistical methods are used to define the accuracy of this software. The results of the analysis show 95% lemmatization accuracy.
This paper reveals the results of an analysis of the accuracy of developed software for automatic lemmatization for the Bulgarian language. This lemmatization software is written entirely in Java and is distributed as a GATE plugin. Certain statistical methods are used to define the accuracy of this software. The results of the analysis show 95% lemmatization accuracy.
△ Less
Submitted 13 June, 2015;
originally announced June 2015.
-
A Publicly Available Cross-Platform Lemmatizer for Bulgarian
Authors:
Grigor Iliev,
Nadezhda Borisova,
Elena Karashtranova,
Dafina Kostadinova
Abstract:
Our dictionary-based lemmatizer for the Bulgarian language presented here is distributed as free software, publicly available to download and use under the GPL v3 license. The presented software is written entirely in Java and is distributed as a GATE plugin. To our best knowledge, at the time of writing this article, there are not any other free lemmatization tools specifically targeting the Bulg…
▽ More
Our dictionary-based lemmatizer for the Bulgarian language presented here is distributed as free software, publicly available to download and use under the GPL v3 license. The presented software is written entirely in Java and is distributed as a GATE plugin. To our best knowledge, at the time of writing this article, there are not any other free lemmatization tools specifically targeting the Bulgarian language. The presented lemmatizer is a work in progress and currently yields an accuracy of about 95% in comparison to the manually annotated corpus BulTreeBank-Morph, which contains 273933 tokens.
△ Less
Submitted 13 June, 2015;
originally announced June 2015.
-
On Detecting Noun-Adjective Agreement Errors in Bulgarian Language Using GATE
Authors:
Nadezhda Borisova,
Grigor Iliev,
Elena Karashtranova
Abstract:
In this article, we describe an approach for automatic detection of noun-adjective agreement errors in Bulgarian texts by explaining the necessary steps required to develop a simple Java-based language processing application. For this purpose, we use the GATE language processing framework, which is capable of analyzing texts in Bulgarian language and can be embedded in software applications, acces…
▽ More
In this article, we describe an approach for automatic detection of noun-adjective agreement errors in Bulgarian texts by explaining the necessary steps required to develop a simple Java-based language processing application. For this purpose, we use the GATE language processing framework, which is capable of analyzing texts in Bulgarian language and can be embedded in software applications, accessed through a set of Java APIs. In our example application we also demonstrate how to use the functionality of GATE to perform regular expressions over annotations for detecting agreement errors in simple noun phrases formed by two words - attributive adjective and a noun, where the attributive adjective precedes the noun. The provided code samples can also be used as a starting point for implementing natural language processing functionalities in software applications related to language processing tasks like detection, annotation and retrieval of word groups meeting a specific set of criteria.
△ Less
Submitted 3 November, 2014;
originally announced November 2014.