Skip to main content

Showing 1–7 of 7 results for author: Mohtaj, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2207.06265  [pdf, other

    cs.CL cs.AI cs.LG

    A Transfer Learning Based Model for Text Readability Assessment in German

    Authors: Salar Mohtaj, Babak Naderi, Sebastian Möller, Faraz Maschhur, Chuyang Wu, Max Reinhard

    Abstract: Text readability assessment has a wide range of applications for different target people, from language learners to people with disabilities. The fast pace of textual content production on the web makes it impossible to measure text complexity without the benefit of machine learning and natural language processing techniques. Although various research addressed the readability assessment of Englis… ▽ More

    Submitted 6 September, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

  2. arXiv:2201.06573  [pdf, other

    cs.CL cs.AI cs.LG

    PerPaDa: A Persian Paraphrase Dataset based on Implicit Crowdsourcing Data Collection

    Authors: Salar Mohtaj, Fatemeh Tavakkoli, Habibollah Asghari

    Abstract: In this paper we introduce PerPaDa, a Persian paraphrase dataset that is collected from users' input in a plagiarism detection system. As an implicit crowdsourcing experience, we have gathered a large collection of original and paraphrased sentences from Hamtajoo; a Persian plagiarism detection system, in which users try to conceal cases of text re-use in their documents by paraphrasing and re-sub… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

    Comments: Submitted to LREC 2022

    Journal ref: Proceedings of the Language Resources and Evaluation Conference. 2022; 5090-5096

  3. arXiv:2201.06286  [pdf, other

    cs.CL cs.AI cs.LG

    MuLVE, A Multi-Language Vocabulary Evaluation Data Set

    Authors: Anik Jacobsen, Salar Mohtaj, Sebastian Möller

    Abstract: Vocabulary learning is vital to foreign language learning. Correct and adequate feedback is essential to successful and satisfying vocabulary training. However, many vocabulary and language evaluation systems perform on simple rules and do not account for real-life user learning data. This work introduces Multi-Language Vocabulary Evaluation Data Set (MuLVE), a data set consisting of vocabulary ca… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

    Comments: Submitted to LREC 2022

    Journal ref: Proceedings of the Language Resources and Evaluation Conference. 2022; 673-679

  4. arXiv:2201.04227  [pdf, other

    cs.CL cs.AI cs.LG

    A Feature Extraction based Model for Hate Speech Identification

    Authors: Salar Mohtaj, Vera Schmitt, Sebastian Möller

    Abstract: The detection of hate speech online has become an important task, as offensive language such as hurtful, obscene and insulting content can harm marginalized people or groups. This paper presents TU Berlin team experiments and results on the task 1A and 1B of the shared task on hate speech and offensive content identification in Indo-European languages 2021. The success of different Natural Languag… ▽ More

    Submitted 11 January, 2022; originally announced January 2022.

    Comments: Accepted at FIRE 2021 - Hate Speech and offensive content detection (HASOC) Track

  5. arXiv:2112.13742  [pdf, other

    cs.CL cs.IR cs.LG

    Hamtajoo: A Persian Plagiarism Checker for Academic Manuscripts

    Authors: Vahid Zarrabi, Salar Mohtaj, Habibollah Asghari

    Abstract: In recent years, due to the high availability of electronic documents through the Web, the plagiarism has become a serious challenge, especially among scholars. Various plagiarism detection systems have been developed to prevent text re-use and to confront plagiarism. Although it is almost easy to detect duplicate text in academic manuscripts, finding patterns of text re-use that has been semantic… ▽ More

    Submitted 27 December, 2021; originally announced December 2021.

  6. arXiv:2002.06585  [pdf

    cs.CY cs.IR

    Untrue.News: A New Search Engine For Fake Stories

    Authors: Vinicius Woloszyn, Felipe Schaeffer, Beliza Boniatti, Eduardo Cortes, Salar Mohtaj, Sebastian Möller

    Abstract: In this paper, we demonstrate Untrue News, a new search engine for fake stories. Untrue News is easy to use and offers useful features such as: a) a multi-language option combining fake stories from different countries and languages around the same subject or person; b) an user privacy protector, avoiding the filter bubble by employing a bias-free ranking scheme; and c) a collaborative platform th… ▽ More

    Submitted 16 February, 2020; originally announced February 2020.

  7. arXiv:1904.07733  [pdf, other

    cs.CL

    Subjective Assessment of Text Complexity: A Dataset for German Language

    Authors: Babak Naderi, Salar Mohtaj, Kaspar Ensikat, Sebastian Möller

    Abstract: This paper presents TextComplexityDE, a dataset consisting of 1000 sentences in German language taken from 23 Wikipedia articles in 3 different article-genres to be used for developing text-complexity predictor models and automatic text simplification in German language. The dataset includes subjective assessment of different text-complexity aspects provided by German learners in level A and B. In… ▽ More

    Submitted 16 April, 2019; originally announced April 2019.