Skip to main content

Showing 1–2 of 2 results for author: Rejmund, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:1603.06785  [pdf

    cs.CL stat.ML

    Multi-domain machine translation enhancements by parallel data extraction from comparable corpora

    Authors: Krzysztof Wołk, Emilia Rejmund, Krzysztof Marasek

    Abstract: Parallel texts are a relatively rare language resource, however, they constitute a very useful research material with a wide range of applications. This study presents and analyses new methodologies we developed for obtaining such data from previously built comparable corpora. The methodologies are automatic and unsupervised which makes them good for large scale research. The task is highly practi… ▽ More

    Submitted 22 March, 2016; originally announced March 2016.

    Comments: parallel corpus, Polish, English, machine learning, comparable corpora, NLP. in Gruszczyńska, Ewa; Leńko-Szymańska, Agnieszka, red. (2016). Polskojęzyczne korpusy równoległe. Polish-language Parallel Corpora. Warszawa: Instytut Lingwistyki Stosowanej. ISBN: 978-83-935320-4

  2. Harvesting comparable corpora and mining them for equivalent bilingual sentences using statistical classification and analogy- based heuristics

    Authors: Krzysztof Wołk, Emilia Rejmund, Krzysztof Marasek

    Abstract: Parallel sentences are a relatively scarce but extremely useful resource for many applications including cross-lingual retrieval and statistical machine translation. This research explores our new methodologies for mining such data from previously obtained comparable corpora. The task is highly practical since non-parallel multilingual data exist in far greater quantities than parallel corpora, bu… ▽ More

    Submitted 18 November, 2015; originally announced November 2015.

    Comments: Springer p. 433-441, 2015