Skip to main content

Showing 1–18 of 18 results for author: Shterionov, D

.
  1. arXiv:2503.01553  [pdf, ps, other

    cs.CL

    Co-creation for Sign Language Processing and Machine Translation

    Authors: Lisa Lepp, Dimitar Shterionov, Mirella De Sisto, Grzegorz Chrupała

    Abstract: Sign language machine translation (SLMT) -- the task of automatically translating between sign and spoken languages or between sign languages -- is a complex task within the field of NLP. Its multi-modal and non-linear nature require the joint efforts of sign language (SL) linguists, technical experts and SL users. Effective user involvement is a challenge that can be addressed through co-creation… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Submitted to the MDPI special issue "Human and Machine Translation: Recent Trends and Foundations"

  2. arXiv:2501.09534  [pdf, other

    cs.AI

    AI in Support of Diversity and Inclusion

    Authors: Çiçek Güven, Afra Alishahi, Henry Brighton, Gonzalo Nápoles, Juan Sebastian Olier, Marie Šafář, Eric Postma, Dimitar Shterionov, Mirella De Sisto, Eva Vanmassenhove

    Abstract: In this paper, we elaborate on how AI can support diversity and inclusion and exemplify research projects conducted in that direction. We start by looking at the challenges and progress in making large language models (LLMs) more transparent, inclusive, and aware of social biases. Even though LLMs like ChatGPT have impressive abilities, they struggle to understand different cultural contexts and e… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

    Comments: 14 pages, 2 figures

  3. arXiv:2406.07970  [pdf, other

    cs.CL

    Guiding In-Context Learning of LLMs through Quality Estimation for Machine Translation

    Authors: Javad Pourmostafa Roshan Sharami, Dimitar Shterionov, Pieter Spronck

    Abstract: The quality of output from large language models (LLMs), particularly in machine translation (MT), is closely tied to the quality of in-context examples (ICEs) provided along with the query, i.e., the text to translate. The effectiveness of these ICEs is influenced by various factors, such as the domain of the source text, the order in which the ICEs are presented, the number of these examples, an… ▽ More

    Submitted 18 September, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Camera-ready version of the paper for the Association for Machine Translation in the Americas (AMTA), including the link to the paper's repository

  4. arXiv:2304.08891  [pdf, other

    cs.CL

    Tailoring Domain Adaptation for Machine Translation Quality Estimation

    Authors: Javad Pourmostafa Roshan Sharami, Dimitar Shterionov, Frédéric Blain, Eva Vanmassenhove, Mirella De Sisto, Chris Emmery, Pieter Spronck

    Abstract: While quality estimation (QE) can play an important role in the translation process, its effectiveness relies on the availability and quality of training data. For QE in particular, high-quality labeled data is often lacking due to the high cost and effort associated with labeling such data. Aside from the data scarcity challenge, QE models should also be generalizable, i.e., they should be able t… ▽ More

    Submitted 9 May, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: Accepted to EAMT 2023 (main)

  5. arXiv:2303.00722  [pdf, other

    cs.CL cs.AI

    A Systematic Analysis of Vocabulary and BPE Settings for Optimal Fine-tuning of NMT: A Case Study of In-domain Translation

    Authors: J. Pourmostafa Roshan Sharami, D. Shterionov, P. Spronck

    Abstract: The effectiveness of Neural Machine Translation (NMT) models largely depends on the vocabulary used at training; small vocabularies can lead to out-of-vocabulary problems -- large ones, to memory issues. Subword (SW) tokenization has been successfully employed to mitigate these issues. The choice of vocabulary and SW tokenization has a significant impact on both training and fine-tuning an NMT mod… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

  6. Evaluating the Effectiveness of Pre-trained Language Models in Predicting the Helpfulness of Online Product Reviews

    Authors: Ali Boluki, Javad Pourmostafa Roshan Sharami, Dimitar Shterionov

    Abstract: Businesses and customers can gain valuable information from product reviews. The sheer number of reviews often necessitates ranking them based on their potential helpfulness. However, only a few reviews ever receive any helpfulness votes on online marketplaces. Sorting all reviews based on the few existing votes can cause helpful reviews to go unnoticed because of the limited attention span of rea… ▽ More

    Submitted 19 February, 2023; originally announced February 2023.

    Journal ref: Proceedings of the 2023 Intelligent Systems Conference (IntelliSys), vol. 4, (2023), 15-35

  7. Machine Translation from Signed to Spoken Languages: State of the Art and Challenges

    Authors: Mathieu De Coster, Dimitar Shterionov, Mieke Van Herreweghe, Joni Dambre

    Abstract: Automatic translation from signed to spoken languages is an interdisciplinary research domain, lying on the intersection of computer vision, machine translation and linguistics. Nevertheless, research in this domain is performed mostly by computer scientists in isolation. As the domain is becoming increasingly popular - the majority of scientific papers on the topic of sign language translation ha… ▽ More

    Submitted 5 April, 2023; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: This is the version of the article submitted to peer review to Universal Access in the Information Society. Please refer to "De Coster, M., Shterionov, D., Van Herreweghe, M. et al. Machine translation from signed to spoken languages: state of the art and challenges. Univ Access Inf Soc (2023)." for the published and updated version

  8. arXiv:2202.02170  [pdf, other

    cs.CL

    The Ecological Footprint of Neural Machine Translation Systems

    Authors: Dimitar Shterionov, Eva Vanmassenhove

    Abstract: Over the past decade, deep learning (DL) has led to significant advancements in various fields of artificial intelligence, including machine translation (MT). These advancements would not be possible without the ever-growing volumes of data and the hardware that allows large DL models to be trained efficiently. Due to the large amount of computing cores as well as dedicated memory, graphics proces… ▽ More

    Submitted 4 February, 2022; originally announced February 2022.

    Comments: 25 pages, 3 figures, 10 tables

  9. arXiv:2112.06096  [pdf, other

    cs.CL cs.AI cs.LG

    Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts

    Authors: Javad Pourmostafa Roshan Sharami, Dimitar Shterionov, Pieter Spronck

    Abstract: Continuously-growing data volumes lead to larger generic models. Specific use-cases are usually left out, since generic models tend to perform poorly in domain-specific cases. Our work addresses this gap with a method for selecting in-domain data from generic-domain (parallel text) corpora, for the task of machine translation. The proposed method ranks sentences in parallel general-domain data acc… ▽ More

    Submitted 6 February, 2022; v1 submitted 11 December, 2021; originally announced December 2021.

    Comments: Accepted to the CLIN Journal on Dec 6, 2021 (Camera-ready Version)

  10. arXiv:2109.06105  [pdf, other

    cs.CL cs.AI

    NeuTral Rewriter: A Rule-Based and Neural Approach to Automatic Rewriting into Gender-Neutral Alternatives

    Authors: Eva Vanmassenhove, Chris Emmery, Dimitar Shterionov

    Abstract: Recent years have seen an increasing need for gender-neutral and inclusive language. Within the field of NLP, there are various mono- and bilingual use cases where gender inclusive language is appropriate, if not preferred due to ambiguity or uncertainty in terms of the gender of referents. In this work, we present a rule-based and a neural approach to gender-neutral rewriting for English along wi… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

  11. arXiv:2102.00287  [pdf, other

    cs.CL cs.AI cs.CY

    Machine Translationese: Effects of Algorithmic Bias on Linguistic Complexity in Machine Translation

    Authors: Eva Vanmassenhove, Dimitar Shterionov, Matthew Gwilliam

    Abstract: Recent studies in the field of Machine Translation (MT) and Natural Language Processing (NLP) have shown that existing models amplify biases observed in the training data. The amplification of biases in language technology has mainly been examined with respect to specific phenomena, such as gender bias. In this work, we go beyond the study of gender in MT and investigate how bias amplification mig… ▽ More

    Submitted 30 January, 2021; originally announced February 2021.

  12. arXiv:2005.00308  [pdf, other

    cs.CL

    Selecting Backtranslated Data from Multiple Sources for Improved Neural Machine Translation

    Authors: Xabier Soto, Dimitar Shterionov, Alberto Poncelas, Andy Way

    Abstract: Machine translation (MT) has benefited from using synthetic training data originating from translating monolingual corpora, a technique known as backtranslation. Combining backtranslated data from different sources has led to better results than when using such data in isolation. In this work we analyse the impact that data translated with rule-based, phrase-based statistical and neural MT systems… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

    Journal ref: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL (2020)

  13. arXiv:1909.03750  [pdf, other

    cs.CL

    Combining SMT and NMT Back-Translated Data for Efficient NMT

    Authors: Alberto Poncelas, Maja Popovic, Dimitar Shterionov, Gideon Maillette de Buy Wenniger, Andy Way

    Abstract: Neural Machine Translation (NMT) models achieve their best performance when large sets of parallel data are used for training. Consequently, techniques for augmenting the training set have become popular recently. One of these methods is back-translation (Sennrich et al., 2016), which consists on generating synthetic sentences by translating a set of monolingual, target-language sentences using a… ▽ More

    Submitted 9 September, 2019; originally announced September 2019.

    Journal ref: Proceedings of Recent Advances in Natural Language Processing (RANLP 2019). pages 922--931

  14. arXiv:1906.12068  [pdf, other

    cs.CL cs.LG

    Lost in Translation: Loss and Decay of Linguistic Richness in Machine Translation

    Authors: Eva Vanmassenhove, Dimitar Shterionov, Andy Way

    Abstract: This work presents an empirical approach to quantifying the loss of lexical richness in Machine Translation (MT) systems compared to Human Translation (HT). Our experiments show how current MT systems indeed fail to render the lexical diversity of human generated or translated text. The inability of MT systems to generate diverse outputs and its tendency to exacerbate already frequent patterns whi… ▽ More

    Submitted 28 June, 2019; originally announced June 2019.

    Comments: Accepted for publication at the 17th Machine Translation Summit (MTSummit2019), Dublin, Ireland, August 2019

  15. arXiv:1902.08856  [pdf, ps, other

    cs.CL

    ABI Neural Ensemble Model for Gender Prediction Adapt Bar-Ilan Submission for the CLIN29 Shared Task on Gender Prediction

    Authors: Eva Vanmassenhove, Amit Moryossef, Alberto Poncelas, Andy Way, Dimitar Shterionov

    Abstract: We present our system for the CLIN29 shared task on cross-genre gender detection for Dutch. We experimented with a multitude of neural models (CNN, RNN, LSTM, etc.), more "traditional" models (SVM, RF, LogReg, etc.), different feature sets as well as data pre-processing. The final results suggested that using tokenized, non-lowercased data works best for most of the neural models, while a combinat… ▽ More

    Submitted 23 February, 2019; originally announced February 2019.

    Comments: Conference: Computational Linguistics of the Netherlands CLIN29

  16. arXiv:1804.06189  [pdf, other

    cs.CL

    Investigating Backtranslation in Neural Machine Translation

    Authors: Alberto Poncelas, Dimitar Shterionov, Andy Way, Gideon Maillette de Buy Wenniger, Peyman Passban

    Abstract: A prerequisite for training corpus-based machine translation (MT) systems -- either Statistical MT (SMT) or Neural MT (NMT) -- is the availability of high-quality parallel data. This is arguably more important today than ever before, as NMT has been shown in many studies to outperform SMT, but mostly when large parallel corpora are available; in cases where data is limited, SMT can still outperfor… ▽ More

    Submitted 17 April, 2018; originally announced April 2018.

  17. arXiv:1304.6810  [pdf, other

    cs.AI cs.LG cs.LO

    Inference and learning in probabilistic logic programs using weighted Boolean formulas

    Authors: Daan Fierens, Guy Van den Broeck, Joris Renkens, Dimitar Shterionov, Bernd Gutmann, Ingo Thon, Gerda Janssens, Luc De Raedt

    Abstract: Probabilistic logic programs are logic programs in which some of the facts are annotated with probabilities. This paper investigates how classical inference and learning tasks known from the graphical model community can be tackled for probabilistic logic programs. Several such tasks such as computing the marginals given evidence and learning from (partial) interpretations have not really been add… ▽ More

    Submitted 25 April, 2013; originally announced April 2013.

    Comments: To appear in Theory and Practice of Logic Programming (TPLP)

    Journal ref: Theory and Practice of Logic Programming 15 (2015) 358-401

  18. arXiv:1009.3798  [pdf, other

    cs.LO

    DNF Sampling for ProbLog Inference

    Authors: Dimitar Sht. Shterionov, Angelika Kimmig, Theofrastos Mantadelis, Gerda Janssens

    Abstract: Inference in probabilistic logic languages such as ProbLog, an extension of Prolog with probabilistic facts, is often based on a reduction to a propositional formula in DNF. Calculating the probability of such a formula involves the disjoint-sum-problem, which is computationally hard. In this work we introduce a new approximation method for ProbLog inference which exploits the DNF to focus samplin… ▽ More

    Submitted 21 September, 2010; v1 submitted 20 September, 2010; originally announced September 2010.

    Comments: Online proceedings of the Joint Workshop on Implementation of Constraint Logic Programming Systems and Logic-based Methods in Programming Environments (CICLOPS-WLPE 2010), Edinburgh, Scotland, U.K., July 15, 2010

    Journal ref: Proceedings of CICLOPS-WLPE 2010