Skip to main content

Showing 1–2 of 2 results for author: Shingange, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.00626  [pdf, other

    cs.CL

    Correcting FLORES Evaluation Dataset for Four African Languages

    Authors: Idris Abdulmumin, Sthembiso Mkhwanazi, Mahlatse S. Mbooi, Shamsuddeen Hassan Muhammad, Ibrahim Said Ahmad, Neo Putini, Miehleketo Mathebula, Matimba Shingange, Tajuddeen Gwadabe, Vukosi Marivate

    Abstract: This paper describes the corrections made to the FLORES evaluation (dev and devtest) dataset for four African languages, namely Hausa, Northern Sotho (Sepedi), Xitsonga, and isiZulu. The original dataset, though groundbreaking in its coverage of low-resource languages, exhibited various inconsistencies and inaccuracies in the reviewed languages that could potentially hinder the integrity of the ev… ▽ More

    Submitted 5 October, 2024; v1 submitted 1 September, 2024; originally announced September 2024.

  2. arXiv:2303.03750  [pdf, other

    cs.CL

    Preparing the Vuk'uzenzele and ZA-gov-multilingual South African multilingual corpora

    Authors: Richard Lastrucci, Isheanesu Dzingirai, Jenalea Rajab, Andani Madodonga, Matimba Shingange, Daniel Njini, Vukosi Marivate

    Abstract: This paper introduces two multilingual government themed corpora in various South African languages. The corpora were collected by gathering the South African Government newspaper (Vuk'uzenzele), as well as South African government speeches (ZA-gov-multilingual), that are translated into all 11 South African official languages. The corpora can be used for a myriad of downstream NLP tasks. The corp… ▽ More

    Submitted 5 April, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: Accepted and to appear at Fourth workshop on Resources for African Indigenous Languages (RAIL) at EACL 2023