Skip to main content

Showing 1–6 of 6 results for author: Nisioi, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.03820  [pdf

    cs.CL

    Automatic Correction of Writing Anomalies in Hausa Texts

    Authors: Ahmad Mustapha Wali, Sergiu Nisioi

    Abstract: Hausa texts are often characterized by writing anomalies such as incorrect character substitutions and spacing errors, which sometimes hinder natural language processing (NLP) applications. This paper presents an approach to automatically correct the anomalies by finetuning transformer-based models. Using a corpus gathered from several public sources, we created a large-scale parallel dataset of o… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  2. arXiv:2505.03025  [pdf, ps, other

    cs.CL cs.AI

    A Typology of Synthetic Datasets for Dialogue Processing in Clinical Contexts

    Authors: Steven Bedrick, A. Seza Doğruöz, Sergiu Nisioi

    Abstract: Synthetic data sets are used across linguistic domains and NLP tasks, particularly in scenarios where authentic data is limited (or even non-existent). One such domain is that of clinical (healthcare) contexts, where there exist significant and long-standing challenges (e.g., privacy, anonymization, and data governance) which have led to the development of an increasing number of synthetic dataset… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  3. arXiv:2410.17728  [pdf, other

    cs.CL

    Dialectal and Low-Resource Machine Translation for Aromanian

    Authors: Alexandru-Iulius Jerpelea, Alina Rădoi, Sergiu Nisioi

    Abstract: This paper presents the process of building a neural machine translation system with support for English, Romanian, and Aromanian - an endangered Eastern Romance language. The primary contribution of this research is twofold: (1) the creation of the most extensive Aromanian-Romanian parallel corpus to date, consisting of 79,000 sentence pairs, and (2) the development and comparative analysis of se… ▽ More

    Submitted 7 January, 2025; v1 submitted 23 October, 2024; originally announced October 2024.

    Comments: Accepted at COLING 2025

  4. arXiv:2403.11227  [pdf, other

    cs.CL cs.LG

    Cheap Ways of Extracting Clinical Markers from Texts

    Authors: Anastasia Sandu, Teodor Mihailescu, Sergiu Nisioi

    Abstract: This paper describes the work of the UniBuc Archaeology team for CLPsych's 2024 Shared Task, which involved finding evidence within the text supporting the assigned suicide risk level. Two types of evidence were required: highlights (extracting relevant spans within the text) and summaries (aggregating evidence into a synthesis). Our work focuses on evaluating Large Language Models (LLM) as oppose… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: https://github.com/nlp-unibuc/clpsych24-task

  5. arXiv:1703.04336  [pdf, other

    cs.IR cs.CL

    A Visual Representation of Wittgenstein's Tractatus Logico-Philosophicus

    Authors: Anca Bucur, Sergiu Nisioi

    Abstract: In this paper we present a data visualization method together with its potential usefulness in digital humanities and philosophy of language. We compile a multilingual parallel corpus from different versions of Wittgenstein's Tractatus Logico-Philosophicus, including the original in German and translations into English, Spanish, French, and Russian. Using this corpus, we compute a similarity measu… ▽ More

    Submitted 13 March, 2017; originally announced March 2017.

    Comments: Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)

  6. arXiv:1609.03204  [pdf, other

    cs.CL

    On the Similarities Between Native, Non-native and Translated Texts

    Authors: Ella Rabinovich, Sergiu Nisioi, Noam Ordan, Shuly Wintner

    Abstract: We present a computational analysis of three language varieties: native, advanced non-native, and translation. Our goal is to investigate the similarities and differences between non-native language productions and translations, contrasting both with native language. Using a collection of computational methods we establish three main results: (1) the three types of texts are easily distinguishable… ▽ More

    Submitted 11 September, 2016; originally announced September 2016.

    Comments: ACL2016, 12 pages