Skip to main content

Showing 1–7 of 7 results for author: Galadanci, B S

Searching in archive cs. Search in all archives.
.
  1. Quantity vs. Quality of Monolingual Source Data in Automatic Text Translation: Can It Be Too Little If It Is Too Good?

    Authors: Idris Abdulmumin, Bashir Shehu Galadanci, Garba Aliyu, Shamsuddeen Hassan Muhammad

    Abstract: Monolingual data, being readily available in large quantities, has been used to upscale the scarcely available parallel data to train better models for automatic translation. Self-learning, where a model is made to learn from its output, is one approach to exploit such data. However, it has been shown that too much of this data can be detrimental to the performance of the model if the available pa… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  2. arXiv:2205.01133  [pdf, other

    cs.CL cs.CV cs.LG

    Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation

    Authors: Idris Abdulmumin, Satya Ranjan Dash, Musa Abdullahi Dawud, Shantipriya Parida, Shamsuddeen Hassan Muhammad, Ibrahim Sa'id Ahmad, Subhadarshi Panda, Ondřej Bojar, Bashir Shehu Galadanci, Bello Shehu Bello

    Abstract: Multi-modal Machine Translation (MMT) enables the use of visual information to enhance the quality of translations. The visual information can serve as a valuable piece of context information to decrease the ambiguity of input sentences. Despite the increasing popularity of such a technique, good and sizeable datasets are scarce, limiting the full extent of their potential. Hausa, a Chadic languag… ▽ More

    Submitted 6 May, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

    Comments: Accepted at Language Resources and Evaluation Conference 2022 (LREC2022)

  3. arXiv:2011.07403  [pdf, other

    cs.CL cs.LG

    A Hybrid Approach for Improved Low Resource Neural Machine Translation using Monolingual Data

    Authors: Idris Abdulmumin, Bashir Shehu Galadanci, Abubakar Isa, Habeebah Adamu Kakudi, Ismaila Idris Sinan

    Abstract: Many language pairs are low resource, meaning the amount and/or quality of available parallel data is not sufficient to train a neural machine translation (NMT) model which can reach an acceptable standard of accuracy. Many works have explored using the readily available monolingual data in either or both of the languages to improve the standard of translation models in low, and even high, resourc… ▽ More

    Submitted 22 November, 2021; v1 submitted 14 November, 2020; originally announced November 2020.

    Comments: 16 pages, 4 figures, 10 Tables

    Journal ref: Engineering Letters, vol. 29, no. 4, pp1478-1493, 2021

  4. Enhanced back-translation for low resource neural machine translation using self-training

    Authors: Idris Abdulmumin, Bashir Shehu Galadanci, Abubakar Isa

    Abstract: Improving neural machine translation (NMT) models using the back-translations of the monolingual target data (synthetic parallel data) is currently the state-of-the-art approach for training improved translation systems. The quality of the backward system - which is trained on the available parallel data and used for the back-translation - has been shown in many studies to affect the performance o… ▽ More

    Submitted 24 December, 2020; v1 submitted 4 June, 2020; originally announced June 2020.

    Comments: 17 pages, 3 figures, 5 tables; Accepted for publication in the International Conference on Information and Communication Technology and Applications (ICTA 2020)

  5. arXiv:2001.11327   

    cs.CL

    Iterative Batch Back-Translation for Neural Machine Translation: A Conceptual Model

    Authors: Idris Abdulmumin, Bashir Shehu Galadanci, Abubakar Isa

    Abstract: An effective method to generate a large number of parallel sentences for training improved neural machine translation (NMT) systems is the use of back-translations of the target-side monolingual data. Recently, iterative back-translation has been shown to outperform standard back-translation albeit on some language pairs. This work proposes the iterative batch back-translation that is aimed at enh… ▽ More

    Submitted 16 November, 2020; v1 submitted 26 November, 2019; originally announced January 2020.

    Comments: This article was a proposal, a conceptual model and, thereby, substantially overlapping with arXiv:1912.10514. This research has been substantially reworked. Some of the findings are presented in arXiv:1912.10514, arXiv:2006.02876 and arXiv:2011.07403. The final work will be submitted for publishing in due course

  6. Tag-less Back-Translation

    Authors: Idris Abdulmumin, Bashir Shehu Galadanci, Aliyu Garba

    Abstract: An effective method to generate a large number of parallel sentences for training improved neural machine translation (NMT) systems is the use of the back-translations of the target-side monolingual data. The standard back-translation method has been shown to be unable to efficiently utilize the available huge amount of existing monolingual data because of the inability of translation models to di… ▽ More

    Submitted 9 February, 2021; v1 submitted 22 December, 2019; originally announced December 2019.

    Comments: 29 pages, 4 figures, 13 tables

  7. hauWE: Hausa Words Embedding for Natural Language Processing

    Authors: Idris Abdulmumin, Bashir Shehu Galadanci

    Abstract: Words embedding (distributed word vector representations) have become an essential component of many natural language processing (NLP) tasks such as machine translation, sentiment analysis, word analogy, named entity recognition and word similarity. Despite this, the only work that provides word vectors for Hausa language is that of Bojanowski et al. [1] trained using fastText, consisting of only… ▽ More

    Submitted 25 November, 2019; originally announced November 2019.

    Comments: In Proceedings of the 2019 2nd International Conference of the IEEE Nigeria Computer Chapter