-
Urdu Morphology, Orthography and Lexicon Extraction
Abstract: Urdu is a challenging language because of, first, its Perso-Arabic script and second, its morphological system having inherent grammatical forms and vocabulary of Arabic, Persian and the native languages of South Asia. This paper describes an implementation of the Urdu language as a software API, and we deal with orthography, morphology and the extraction of the lexicon. The morphology is implemen… ▽ More
Submitted 6 April, 2022; originally announced April 2022.
Comments: Published in CAASL-2: The Second Workshop on Computational Approaches to Arabic Script-based Languages, July 21-22, 2007, LSA 2007 Linguistic Institute, Stanford University
-
Embedded Controlled Languages
Abstract: Inspired by embedded programming languages, an embedded CNL (controlled natural language) is a proper fragment of an entire natural language (its host language), but it has a parser that recognizes the entire host language. This makes it possible to process out-of-CNL input and give useful feedback to users, instead of just reporting syntax errors. This extended abstract explains the main concepts… ▽ More
Submitted 16 June, 2014; originally announced June 2014.
Comments: 7 pages, extended abstract, preprint for CNL 2014 in Galway