-
Specification languages for computational laws versus basic legal principles
Authors:
Petia Guintchev,
Joost J. Joosten,
Sofia Santiago Fernández,
Eric Sancho Adamson,
Aleix Solé Sánchez,
Marta Soria Heredia
Abstract:
We speak of a \textit{computational law} when that law is intended to be enforced by software through an automated decision-making process. As digital technologies evolve to offer more solutions for public administrations, we see an ever-increasing number of computational laws. Traditionally, law is written in natural language. Computational laws, however, suffer various complications when written…
▽ More
We speak of a \textit{computational law} when that law is intended to be enforced by software through an automated decision-making process. As digital technologies evolve to offer more solutions for public administrations, we see an ever-increasing number of computational laws. Traditionally, law is written in natural language. Computational laws, however, suffer various complications when written in natural language, such as underspecification and ambiguity which lead to a diversity of possible interpretations to be made by the coder. These could potentially result into an uneven application of the law. Thus, resorting to formal languages to write computational laws is tempting. However, writing laws in a formal language leads to further complications, for example, incomprehensibility for non-experts, lack of explicit motivation of the decisions made, or difficulties in retrieving the data leading to the outcome. In this paper, we investigate how certain legal principles fare in both scenarios: computational law written in natural language or written in formal language. We use a running example from the European Union's road transport regulation to showcase the tensions arising, and the benefits from each language.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Conditioning LLMs to Generate Code-Switched Text
Authors:
Maite Heredia,
Gorka Labaka,
Jeremy Barnes,
Aitor Soroa
Abstract:
Code-switching (CS) is still a critical challenge in Natural Language Processing (NLP). Current Large Language Models (LLMs) struggle to interpret and generate code-switched text, primarily due to the scarcity of large-scale CS datasets for training. This paper presents a novel methodology to generate CS data using LLMs, and test it on the English-Spanish language pair. We propose back-translating…
▽ More
Code-switching (CS) is still a critical challenge in Natural Language Processing (NLP). Current Large Language Models (LLMs) struggle to interpret and generate code-switched text, primarily due to the scarcity of large-scale CS datasets for training. This paper presents a novel methodology to generate CS data using LLMs, and test it on the English-Spanish language pair. We propose back-translating natural CS sentences into monolingual English, and using the resulting parallel corpus to fine-tune LLMs to turn monolingual sentences into CS. Unlike previous approaches to CS generation, our methodology uses natural CS data as a starting point, allowing models to learn its natural distribution beyond grammatical patterns. We thoroughly analyse the models' performance through a study on human preferences, a qualitative error analysis and an evaluation with popular automatic metrics. Results show that our methodology generates fluent code-switched text, expanding research opportunities in CS communication, and that traditional metrics do not correlate with human judgement when assessing the quality of the generated CS data. We release our code and generated dataset under a CC-BY-NC-SA license.
△ Less
Submitted 26 May, 2025; v1 submitted 18 February, 2025;
originally announced February 2025.
-
EuskañolDS: A Naturally Sourced Corpus for Basque-Spanish Code-Switching
Authors:
Maite Heredia,
Jeremy Barnes,
Aitor Soroa
Abstract:
Code-switching (CS) remains a significant challenge in Natural Language Processing (NLP), mainly due a lack of relevant data. In the context of the contact between the Basque and Spanish languages in the north of the Iberian Peninsula, CS frequently occurs in both formal and informal spontaneous interactions. However, resources to analyse this phenomenon and support the development and evaluation…
▽ More
Code-switching (CS) remains a significant challenge in Natural Language Processing (NLP), mainly due a lack of relevant data. In the context of the contact between the Basque and Spanish languages in the north of the Iberian Peninsula, CS frequently occurs in both formal and informal spontaneous interactions. However, resources to analyse this phenomenon and support the development and evaluation of models capable of understanding and generating code-switched language for this language pair are almost non-existent. We introduce a first approach to develop a naturally sourced corpus for Basque-Spanish code-switching. Our methodology consists of identifying CS texts from previously available corpora using language identification models, which are then manually validated to obtain a reliable subset of CS instances. We present the properties of our corpus and make it available under the name EuskañolDS.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
HiTZ at VarDial 2025 NorSID: Overcoming Data Scarcity with Language Transfer and Automatic Data Annotation
Authors:
Jaione Bengoetxea,
Mikel Zubillaga,
Ekhi Azurmendi,
Maite Heredia,
Julen Etxaniz,
Markel Ferro,
Jeremy Barnes
Abstract:
In this paper we present our submission for the NorSID Shared Task as part of the 2025 VarDial Workshop (Scherrer et al., 2025), consisting of three tasks: Intent Detection, Slot Filling and Dialect Identification, evaluated using data in different dialects of the Norwegian language. For Intent Detection and Slot Filling, we have fine-tuned a multitask model in a cross-lingual setting, to leverage…
▽ More
In this paper we present our submission for the NorSID Shared Task as part of the 2025 VarDial Workshop (Scherrer et al., 2025), consisting of three tasks: Intent Detection, Slot Filling and Dialect Identification, evaluated using data in different dialects of the Norwegian language. For Intent Detection and Slot Filling, we have fine-tuned a multitask model in a cross-lingual setting, to leverage the xSID dataset available in 17 languages. In the case of Dialect Identification, our final submission consists of a model fine-tuned on the provided development set, which has obtained the highest scores within our experiments. Our final results on the test set show that our models do not drop in performance compared to the development set, likely due to the domain-specificity of the dataset and the similar distribution of both subsets. Finally, we also report an in-depth analysis of the provided datasets and their artifacts, as well as other sets of experiments that have been carried out but did not yield the best results. Additionally, we present an analysis on the reasons why some methods have been more successful than others; mainly the impact of the combination of languages and domain-specificity of the training data on the results.
△ Less
Submitted 9 January, 2025; v1 submitted 13 December, 2024;
originally announced December 2024.
-
XNLIeu: a dataset for cross-lingual NLI in Basque
Authors:
Maite Heredia,
Julen Etxaniz,
Muitze Zulaika,
Xabier Saralegi,
Jeremy Barnes,
Aitor Soroa
Abstract:
XNLI is a popular Natural Language Inference (NLI) benchmark widely used to evaluate cross-lingual Natural Language Understanding (NLU) capabilities across languages. In this paper, we expand XNLI to include Basque, a low-resource language that can greatly benefit from transfer-learning approaches. The new dataset, dubbed XNLIeu, has been developed by first machine-translating the English XNLI cor…
▽ More
XNLI is a popular Natural Language Inference (NLI) benchmark widely used to evaluate cross-lingual Natural Language Understanding (NLU) capabilities across languages. In this paper, we expand XNLI to include Basque, a low-resource language that can greatly benefit from transfer-learning approaches. The new dataset, dubbed XNLIeu, has been developed by first machine-translating the English XNLI corpus into Basque, followed by a manual post-edition step. We have conducted a series of experiments using mono- and multilingual LLMs to assess a) the effect of professional post-edition on the MT system; b) the best cross-lingual strategy for NLI in Basque; and c) whether the choice of the best cross-lingual strategy is influenced by the fact that the dataset is built by translation. The results show that post-edition is necessary and that the translate-train cross-lingual strategy obtains better results overall, although the gain is lower when tested in a dataset that has been built natively from scratch. Our code and datasets are publicly available under open licenses.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
On Optimal Coverage of a Tree with Multiple Robots
Authors:
I. Aldana-Galván,
J. C. Catana-Salazar,
J. M. Díaz-Báñez,
F. Duque,
R. Fabila-Monroy,
M. A. Heredia,
A. Ramírez-Vigueras,
J. Urrutia
Abstract:
We study the algorithmic problem of optimally covering a tree with $k$ mobile robots. The tree is known to all robots, and our goal is to assign a walk to each robot in such a way that the union of these walks covers the whole tree. We assume that the edges have the same length, and that traveling along an edge takes a unit of time. Two objective functions are considered: the cover time and the co…
▽ More
We study the algorithmic problem of optimally covering a tree with $k$ mobile robots. The tree is known to all robots, and our goal is to assign a walk to each robot in such a way that the union of these walks covers the whole tree. We assume that the edges have the same length, and that traveling along an edge takes a unit of time. Two objective functions are considered: the cover time and the cover length. The cover time is the maximum time a robot needs to finish its assigned walk and the cover length is the sum of the lengths of all the walks. We also consider a variant in which the robots must rendezvous periodically at the same vertex in at most a certain number of moves. We show that the problem is different for the two cost functions. For the cover time minimization problem, we prove that the problem is NP-hard when $k$ is part of the input, regardless of whether periodic rendezvous are required or not. For the cover length minimization problem, we show that it can be solved in polynomial time when periodic rendezvous are not required, and it is NP-hard otherwise.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Algorithms for the Euclidean Bipartite Edge Cover Problem
Authors:
Rodrigo A. Castro,
José M. Díaz-Báñez,
Marco A. Heredia,
Jorge Urrutia,
Inmaculada Ventura,
Francisco J. Zaragoza
Abstract:
Given a graph $G=(V,E)$ with costs on its edges, the minimum-cost edge cover problem consists of finding a subset of $E$ covering all vertices in $V$ at minimum cost. If $G$ is bipartite, this problem can be solved in time $O(|V|^3)$ via a well-known reduction to a maximum-cost matching problem on $G$. If in addition $V$ is a set of points on the Euclidean line, Collanino et al. showed that the pr…
▽ More
Given a graph $G=(V,E)$ with costs on its edges, the minimum-cost edge cover problem consists of finding a subset of $E$ covering all vertices in $V$ at minimum cost. If $G$ is bipartite, this problem can be solved in time $O(|V|^3)$ via a well-known reduction to a maximum-cost matching problem on $G$. If in addition $V$ is a set of points on the Euclidean line, Collanino et al. showed that the problem can be solved in time $O(|V| \log |V|)$ and asked whether it can be solved in time $o(|V|^3)$ if $V$ is a set of points on the Euclidean plane. We answer this in the affirmative, giving an $O(|V|^{2.5} \log |V|)$ algorithm based on the Hungarian method using weighted Voronoi diagrams. We also propose some 2-approximation algorithms and give experimental results of our implementations.
△ Less
Submitted 27 July, 2022; v1 submitted 19 July, 2022;
originally announced July 2022.
-
Fast and Robust Feature Matching for RGB-D Based Localization
Authors:
Miguel Heredia,
Felix Endres,
Wolfram Burgard,
Rafael Sanz
Abstract:
In this paper we present a novel approach to global localization using an RGB-D camera in maps of visual features. For large maps, the performance of pure image matching techniques decays in terms of robustness and computational cost. Particularly, repeated occurrences of similar features due to repeating structure in the world (e.g., doorways, chairs, etc.) or missing associations between observa…
▽ More
In this paper we present a novel approach to global localization using an RGB-D camera in maps of visual features. For large maps, the performance of pure image matching techniques decays in terms of robustness and computational cost. Particularly, repeated occurrences of similar features due to repeating structure in the world (e.g., doorways, chairs, etc.) or missing associations between observations pose critical challenges to visual localization. We address these challenges using a two-step approach. We first estimate a candidate pose using few correspondences between features of the current camera frame and the feature map. The initial set of correspondences is established by proximity in feature space. The initial pose estimate is used in the second step to guide spatial matching of features in 3D, i.e., searching for associations where the image features are expected to be found in the map. A RANSAC algorithm is used to compute a fine estimation of the pose from the correspondences. Our approach clearly outperforms localization based on feature matching exclusively in feature space, both in terms of estimation accuracy and robustness to failure and allows for global localization in real time (30Hz).
△ Less
Submitted 2 February, 2015;
originally announced February 2015.
-
On $k$-Gons and $k$-Holes in Point Sets
Authors:
Oswin Aichholzer,
Ruy Fabila-Monroy,
Hernán González-Aguilar,
Thomas Hackl,
Marco A. Heredia,
Clemens Huemer,
Jorge Urrutia,
Pavel Valtr,
Birgit Vogtenhuber
Abstract:
We consider a variation of the classical Erdős-Szekeres problems on the existence and number of convex $k$-gons and $k$-holes (empty $k$-gons) in a set of $n$ points in the plane. Allowing the $k$-gons to be non-convex, we show bounds and structural results on maximizing and minimizing their numbers. Most noteworthy, for any $k$ and sufficiently large $n$, we give a quadratic lower bound for the n…
▽ More
We consider a variation of the classical Erdős-Szekeres problems on the existence and number of convex $k$-gons and $k$-holes (empty $k$-gons) in a set of $n$ points in the plane. Allowing the $k$-gons to be non-convex, we show bounds and structural results on maximizing and minimizing their numbers. Most noteworthy, for any $k$ and sufficiently large $n$, we give a quadratic lower bound for the number of $k$-holes, and show that this number is maximized by sets in convex position.
△ Less
Submitted 30 August, 2014;
originally announced September 2014.