-
Geocoding multilingual texts: Recognition, disambiguation and visualisation
Abstract: We are presenting a method to recognise geographical references in free text. Our tool must work on various languages with a minimum of language-dependent resources, except a gazetteer. The main difficulty is to disambiguate these place names by distinguishing places from persons and by selecting the most likely place out of a list of homographic place names world-wide. The system uses a number… ▽ More
Submitted 12 September, 2006; originally announced September 2006.
Comments: 6 pages
ACM Class: H.3.1; H.3.3; H.3.4
Journal ref: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006), pp. 53-58. Genoa, Italy, 24-26 May 2006
-
The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages
Abstract: We present a new, unique and freely available parallel corpus containing European Union (EU) documents of mostly legal nature. It is available in all 20 official EUanguages, with additional documents being available in the languages of the EU candidate countries. The corpus consists of almost 8,000 documents per language, with an average size of nearly 9 million words per language. Pair-wise par… ▽ More
Submitted 12 September, 2006; originally announced September 2006.
Comments: A multilingual textual resource with meta-data freely available for download at http://langtech.jrc.it/JRC-Acquis.html
ACM Class: H.3.1; H.3.6
Journal ref: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'2006), pp. 2142-2147. Genoa, Italy, 24-26 May 2006
-
Multilingual person name recognition and transliteration
Abstract: We present an exploratory tool that extracts person names from multilingual news collections, matches name variants referring to the same person, and infers relationships between people based on the co-occurrence of their names in related news. A novel feature is the matching of name variants across languages and writing systems, including names written with the Greek, Cyrillic and Arabic writin… ▽ More
Submitted 11 September, 2006; originally announced September 2006.
Comments: Explains the technology behind the JRC's NewsExplorer application, which is freely accessible at http://press.jrc.it/NewsExplorer
ACM Class: H.3.1; H.3.3; H.3.4; H.3.5
Journal ref: Journal CORELA - Cognition, Representation, Langage. Numeros speciaux, Le traitement lexicographique des noms propres. December 2005. ISSN 1638-5748