-
GenPlanX. Generation of Plans and Execution
Authors:
Daniel Borrajo,
Giuseppe Canonaco,
Tomás de la Rosa,
Alfredo Garrachón,
Sriram Gopalakrishnan,
Simerjot Kaur,
Marianela Morales,
Sunandita Patra,
Alberto Pozanco,
Keshav Ramani,
Charese Smiley,
Pietro Totis,
Manuela Veloso
Abstract:
Classical AI Planning techniques generate sequences of actions for complex tasks. However, they lack the ability to understand planning tasks when provided using natural language. The advent of Large Language Models (LLMs) has introduced novel capabilities in human-computer interaction. In the context of planning tasks, LLMs have shown to be particularly good in interpreting human intents among ot…
▽ More
Classical AI Planning techniques generate sequences of actions for complex tasks. However, they lack the ability to understand planning tasks when provided using natural language. The advent of Large Language Models (LLMs) has introduced novel capabilities in human-computer interaction. In the context of planning tasks, LLMs have shown to be particularly good in interpreting human intents among other uses. This paper introduces GenPlanX that integrates LLMs for natural language-based description of planning tasks, with a classical AI planning engine, alongside an execution and monitoring framework. We demonstrate the efficacy of GenPlanX in assisting users with office-related tasks, highlighting its potential to streamline workflows and enhance productivity through seamless human-AI collaboration.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
Binary classification for perceived quality of headlines and links on worldwide news websites, 2018-2024
Authors:
Austin McCutcheon,
Thiago E. A. de Oliveira,
Aleksandr Zheleznov,
Chris Brogly
Abstract:
The proliferation of online news enables potential widespread publication of perceived low-quality news headlines/links. As a result, we investigated whether it was possible to automatically distinguish perceived lower-quality news headlines/links from perceived higher-quality headlines/links. We evaluated twelve machine learning models on a binary, balanced dataset of 57,544,214 worldwide news we…
▽ More
The proliferation of online news enables potential widespread publication of perceived low-quality news headlines/links. As a result, we investigated whether it was possible to automatically distinguish perceived lower-quality news headlines/links from perceived higher-quality headlines/links. We evaluated twelve machine learning models on a binary, balanced dataset of 57,544,214 worldwide news website links/headings from 2018-2024 (28,772,107 per class) with 115 extracted linguistic features. Binary labels for each text were derived from scores based on expert consensus regarding the respective news domain quality. Traditional ensemble methods, particularly the bagging classifier, had strong performance (88.1% accuracy, 88.3% F1, 80/20 train/test split). Fine-tuned DistilBERT achieved the highest accuracy (90.3%, 80/20 train/test split) but required more training time. The results suggest that both NLP features with traditional classifiers and deep learning models can effectively differentiate perceived news headline/link quality, with some trade-off between predictive performance and train time.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Multiresolution Analysis and Statistical Thresholding on Dynamic Networks
Authors:
Raphaël Romero,
Tijl De Bie,
Nick Heard,
Alexander Modell
Abstract:
Detecting structural change in dynamic network data has wide-ranging applications. Existing approaches typically divide the data into time bins, extract network features within each bin, and then compare these features over time. This introduces an inherent tradeoff between temporal resolution and the statistical stability of the extracted features. Despite this tradeoff, reminiscent of time-frequ…
▽ More
Detecting structural change in dynamic network data has wide-ranging applications. Existing approaches typically divide the data into time bins, extract network features within each bin, and then compare these features over time. This introduces an inherent tradeoff between temporal resolution and the statistical stability of the extracted features. Despite this tradeoff, reminiscent of time-frequency tradeoffs in signal processing, most methods rely on a fixed temporal resolution. Choosing an appropriate resolution parameter is typically difficult and can be especially problematic in domains like cybersecurity, where anomalous behavior may emerge at multiple time scales. We address this challenge by proposing ANIE (Adaptive Network Intensity Estimation), a multi-resolution framework designed to automatically identify the time scales at which network structure evolves, enabling the joint detection of both rapid and gradual changes. Modeling interactions as Poisson processes, our method proceeds in two steps: (1) estimating a low-dimensional subspace of node behavior, and (2) deriving a set of novel empirical affinity coefficients that quantify change in interaction intensity between latent factors and support statistical testing for structural change across time scales. We provide theoretical guarantees for subspace estimation and the asymptotic behavior of the affinity coefficients, enabling model-based change detection. Experiments on synthetic networks show that ANIE adapts to the appropriate time resolution and is able to capture sharp structural changes while remaining robust to noise. Furthermore, applications to real-world data showcase the practical benefits of ANIE's multiresolution approach to detecting structural change over fixed resolution methods.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
AXIOM: Learning to Play Games in Minutes with Expanding Object-Centric Models
Authors:
Conor Heins,
Toon Van de Maele,
Alexander Tschantz,
Hampus Linander,
Dimitrije Markovic,
Tommaso Salvatori,
Corrado Pezzato,
Ozan Catal,
Ran Wei,
Magnus Koudahl,
Marco Perin,
Karl Friston,
Tim Verbelen,
Christopher Buckley
Abstract:
Current deep reinforcement learning (DRL) approaches achieve state-of-the-art performance in various domains, but struggle with data efficiency compared to human learning, which leverages core priors about objects and their interactions. Active inference offers a principled framework for integrating sensory information with prior knowledge to learn a world model and quantify the uncertainty of its…
▽ More
Current deep reinforcement learning (DRL) approaches achieve state-of-the-art performance in various domains, but struggle with data efficiency compared to human learning, which leverages core priors about objects and their interactions. Active inference offers a principled framework for integrating sensory information with prior knowledge to learn a world model and quantify the uncertainty of its own beliefs and predictions. However, active inference models are usually crafted for a single task with bespoke knowledge, so they lack the domain flexibility typical of DRL approaches. To bridge this gap, we propose a novel architecture that integrates a minimal yet expressive set of core priors about object-centric dynamics and interactions to accelerate learning in low-data regimes. The resulting approach, which we call AXIOM, combines the usual data efficiency and interpretability of Bayesian approaches with the across-task generalization usually associated with DRL. AXIOM represents scenes as compositions of objects, whose dynamics are modeled as piecewise linear trajectories that capture sparse object-object interactions. The structure of the generative model is expanded online by growing and learning mixture models from single events and periodically refined through Bayesian model reduction to induce generalization. AXIOM masters various games within only 10,000 interaction steps, with both a small number of parameters compared to DRL, and without the computational expense of gradient-based optimization.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
BiMi Sheets: Infosheets for bias mitigation methods
Authors:
MaryBeth Defrance,
Guillaume Bied,
Maarten Buyl,
Jefrey Lijffijt,
Tijl De Bie
Abstract:
Over the past 15 years, hundreds of bias mitigation methods have been proposed in the pursuit of fairness in machine learning (ML). However, algorithmic biases are domain-, task-, and model-specific, leading to a `portability trap': bias mitigation solutions in one context may not be appropriate in another. Thus, a myriad of design choices have to be made when creating a bias mitigation method, su…
▽ More
Over the past 15 years, hundreds of bias mitigation methods have been proposed in the pursuit of fairness in machine learning (ML). However, algorithmic biases are domain-, task-, and model-specific, leading to a `portability trap': bias mitigation solutions in one context may not be appropriate in another. Thus, a myriad of design choices have to be made when creating a bias mitigation method, such as the formalization of fairness it pursues, and where and how it intervenes in the ML pipeline. This creates challenges in benchmarking and comparing the relative merits of different bias mitigation methods, and limits their uptake by practitioners.
We propose BiMi Sheets as a portable, uniform guide to document the design choices of any bias mitigation method. This enables researchers and practitioners to quickly learn its main characteristics and to compare with their desiderata. Furthermore, the sheets' structure allow for the creation of a structured database of bias mitigation methods. In order to foster the sheets' adoption, we provide a platform for finding and creating BiMi Sheets at bimisheet.com.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Deep Operator Neural Network Model Predictive Control
Authors:
Thomas Oliver de Jong,
Khemraj Shukla,
Mircea Lazar
Abstract:
In this paper, we consider the design of model predictive control (MPC) algorithms based on deep operator neural networks (DeepONets). These neural networks are capable of accurately approximating real and complex valued solutions of continuous time nonlinear systems without relying on recurrent architectures. The DeepONet architecture is made up of two feedforward neural networks: the branch netw…
▽ More
In this paper, we consider the design of model predictive control (MPC) algorithms based on deep operator neural networks (DeepONets). These neural networks are capable of accurately approximating real and complex valued solutions of continuous time nonlinear systems without relying on recurrent architectures. The DeepONet architecture is made up of two feedforward neural networks: the branch network, which encodes the input function space, and the trunk network, which represents dependencies on temporal variables or initial conditions. Utilizing the original DeepONet architecture as a predictor within MPC for Multi Input Multi Output (MIMO) systems requires multiple branch networks, to generate multi output predictions, one for each input. Moreover, to predict multiple time steps into the future, the network has to be evaluated multiple times. Motivated by this, we introduce a multi step DeepONet (MS-DeepONet) architecture that computes in one shot multi step predictions of system outputs from multi step input sequences, which is better suited for MPC. We prove that the MS DeepONet is a universal approximator in terms of multi step sequence prediction. Additionally, we develop automated hyper parameter selection strategies and implement MPC frameworks using both the standard DeepONet and the proposed MS DeepONet architectures in PyTorch. The implementation is publicly available on GitHub. Simulation results demonstrate that MS-DeepONet consistently outperforms the standard DeepONet in learning and predictive control tasks across several nonlinear benchmark systems: the van der Pol oscillator, the quadruple tank process, and a cart pendulum unstable system, where it successfully learns and executes multiple swing up and stabilization policies.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
AstroMLab 4: Benchmark-Topping Performance in Astronomy Q&A with a 70B-Parameter Domain-Specialized Reasoning Model
Authors:
Tijmen de Haan,
Yuan-Sen Ting,
Tirthankar Ghosal,
Tuan Dung Nguyen,
Alberto Accomazzi,
Emily Herron,
Vanessa Lama,
Rui Pan,
Azton Wells,
Nesar Ramachandra
Abstract:
General-purpose large language models, despite their broad capabilities, often struggle with specialized domain knowledge, a limitation particularly pronounced in more accessible, lower-parameter versions. This gap hinders their deployment as effective agents in demanding fields such as astronomy. Building on our prior work with AstroSage-8B, this study introduces AstroSage-70B, a significantly la…
▽ More
General-purpose large language models, despite their broad capabilities, often struggle with specialized domain knowledge, a limitation particularly pronounced in more accessible, lower-parameter versions. This gap hinders their deployment as effective agents in demanding fields such as astronomy. Building on our prior work with AstroSage-8B, this study introduces AstroSage-70B, a significantly larger and more advanced domain-specialized natural-language AI assistant. It is designed for research and education across astronomy, astrophysics, space science, astroparticle physics, cosmology, and astronomical instrumentation. Developed from the Llama-3.1-70B foundation, AstroSage-70B underwent extensive continued pre-training on a vast corpus of astronomical literature, followed by supervised fine-tuning and model merging. Beyond its 70-billion parameter scale, this model incorporates refined datasets, judiciously chosen learning hyperparameters, and improved training procedures, achieving state-of-the-art performance on complex astronomical tasks. Notably, we integrated reasoning chains into the SFT dataset, enabling AstroSage-70B to either answer the user query immediately, or first emit a human-readable thought process. Evaluated on the AstroMLab-1 benchmark -- comprising 4,425 questions from literature withheld during training -- AstroSage-70B achieves state-of-the-art performance. It surpasses all other tested open-weight and proprietary models, including leading systems like o3, Gemini-2.5-Pro, Claude-3.7-Sonnet, Deepseek-R1, and Qwen-3-235B, even those with API costs two orders of magnitude higher. This work demonstrates that domain specialization, when applied to large-scale models, can enable them to outperform generalist counterparts in specialized knowledge areas like astronomy, thereby advancing the frontier of AI capabilities in the field.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Comparing Parallel Functional Array Languages: Programming and Performance
Authors:
David van Balen,
Tiziano De Matteis,
Clemens Grelck,
Troels Henriksen,
Aaron W. Hsu,
Gabriele K. Keller,
Thomas Koopman,
Trevor L. McDonell,
Cosmin Oancea,
Sven-Bodo Scholz,
Artjoms Sinkarovs,
Tom Smeding,
Phil Trinder,
Ivo Gabe de Wolff,
Alexandros Nikolaos Ziogas
Abstract:
Parallel functional array languages are an emerging class of programming languages that promise to combine low-effort parallel programming with good performance and performance portability. We systematically compare the designs and implementations of five different functional array languages: Accelerate, APL, DaCe, Futhark, and SaC. We demonstrate the expressiveness of functional array programming…
▽ More
Parallel functional array languages are an emerging class of programming languages that promise to combine low-effort parallel programming with good performance and performance portability. We systematically compare the designs and implementations of five different functional array languages: Accelerate, APL, DaCe, Futhark, and SaC. We demonstrate the expressiveness of functional array programming by means of four challenging benchmarks, namely N-body simulation, MultiGrid, Quickhull, and Flash Attention. These benchmarks represent a range of application domains and parallel computational models. We argue that the functional array code is much shorter and more comprehensible than the hand-optimized baseline implementations because it omits architecture-specific aspects. Instead, the language implementations generate both multicore and GPU executables from a single source code base. Hence, we further argue that functional array code could more easily be ported to, and optimized for, new parallel architectures than conventional implementations of numerical kernels. We demonstrate this potential by reporting the performance of the five parallel functional array languages on a total of 39 instances of the four benchmarks on both a 32-core AMD EPYC 7313 multicore system and on an NVIDIA A30 GPU. We explore in-depth why each language performs well or not so well on each benchmark and architecture. We argue that the results demonstrate that mature functional array languages have the potential to deliver performance competitive with the best available conventional techniques.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
JobHop: A Large-Scale Dataset of Career Trajectories
Authors:
Iman Johary,
Raphael Romero,
Alexandru C. Mara,
Tijl De Bie
Abstract:
Understanding labor market dynamics is essential for policymakers, employers, and job seekers. However, comprehensive datasets that capture real-world career trajectories are scarce. In this paper, we introduce JobHop, a large-scale public dataset derived from anonymized resumes provided by VDAB, the public employment service in Flanders, Belgium. Utilizing Large Language Models (LLMs), we process…
▽ More
Understanding labor market dynamics is essential for policymakers, employers, and job seekers. However, comprehensive datasets that capture real-world career trajectories are scarce. In this paper, we introduce JobHop, a large-scale public dataset derived from anonymized resumes provided by VDAB, the public employment service in Flanders, Belgium. Utilizing Large Language Models (LLMs), we process unstructured resume data to extract structured career information, which is then mapped to standardized ESCO occupation codes using a multi-label classification model. This results in a rich dataset of over 2.3 million work experiences, extracted from and grouped into more than 391,000 user resumes and mapped to standardized ESCO occupation codes, offering valuable insights into real-world occupational transitions. This dataset enables diverse applications, such as analyzing labor market mobility, job stability, and the effects of career breaks on occupational transitions. It also supports career path prediction and other data-driven decision-making processes. To illustrate its potential, we explore key dataset characteristics, including job distributions, career breaks, and job transitions, demonstrating its value for advancing labor market research.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
A probabilistic view on Riemannian machine learning models for SPD matrices
Authors:
Thibault de Surrel,
Florian Yger,
Fabien Lotte,
Sylvain Chevallier
Abstract:
The goal of this paper is to show how different machine learning tools on the Riemannian manifold $\mathcal{P}_d$ of Symmetric Positive Definite (SPD) matrices can be united under a probabilistic framework. For this, we will need several Gaussian distributions defined on $\mathcal{P}_d$. We will show how popular classifiers on $\mathcal{P}_d$ can be reinterpreted as Bayes Classifiers using these G…
▽ More
The goal of this paper is to show how different machine learning tools on the Riemannian manifold $\mathcal{P}_d$ of Symmetric Positive Definite (SPD) matrices can be united under a probabilistic framework. For this, we will need several Gaussian distributions defined on $\mathcal{P}_d$. We will show how popular classifiers on $\mathcal{P}_d$ can be reinterpreted as Bayes Classifiers using these Gaussian distributions. These distributions will also be used for outlier detection and dimension reduction. By showing that those distributions are pervasive in the tools used on $\mathcal{P}_d$, we allow for other machine learning tools to be extended to $\mathcal{P}_d$.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
Photoshop Batch Rendering Using Actions for Stylistic Video Editing
Authors:
Tessa De La Fuente
Abstract:
My project looks at an efficient workflow for creative image/video editing using Adobe Photoshop Actions tool and Batch Processing System. This innovative approach to video editing through Photoshop creates a fundamental shift to creative workflow management through the integration of industry-leading image manipulation with video editing techniques. Through systematic automation of Actions, users…
▽ More
My project looks at an efficient workflow for creative image/video editing using Adobe Photoshop Actions tool and Batch Processing System. This innovative approach to video editing through Photoshop creates a fundamental shift to creative workflow management through the integration of industry-leading image manipulation with video editing techniques. Through systematic automation of Actions, users can achieve a simple and consistent application of visual edits across a string of images. This approach provides an alternative method to optimize productivity while ensuring uniform results across image collections through a post-processing pipeline.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
Quaternion Domain Super MDS for 3D Localization
Authors:
Keigo Masuoka,
Takumi Takahashi,
Giuseppe Thadeu Freitas de Abreu,
Hideki Ochiai
Abstract:
We propose a novel low-complexity three-dimensional (3D) localization algorithm for wireless sensor networks, termed quaternion-domain super multidimensional scaling (QD-SMDS). This algorithm reformulates the conventional SMDS, which was originally developed in the real domain, into the quaternion domain. By representing 3D coordinates as quaternions, the method enables the construction of a rank-…
▽ More
We propose a novel low-complexity three-dimensional (3D) localization algorithm for wireless sensor networks, termed quaternion-domain super multidimensional scaling (QD-SMDS). This algorithm reformulates the conventional SMDS, which was originally developed in the real domain, into the quaternion domain. By representing 3D coordinates as quaternions, the method enables the construction of a rank-1 Gram edge kernel (GEK) matrix that integrates both relative distance and angular (phase) information between nodes, maximizing the noise reduction effect achieved through low-rank truncation via singular value decomposition (SVD). The simulation results indicate that the proposed method demonstrates a notable enhancement in localization accuracy relative to the conventional SMDS algorithm, particularly in scenarios characterized by substantial measurement errors.
△ Less
Submitted 28 April, 2025; v1 submitted 24 April, 2025;
originally announced April 2025.
-
Expected Free Energy-based Planning as Variational Inference
Authors:
Bert de Vries,
Wouter Nuijten,
Thijs van de Laar,
Wouter Kouw,
Sepideh Adamiat,
Tim Nisslbeck,
Mykola Lukashchuk,
Hoang Minh Huu Nguyen,
Marco Hidalgo Araya,
Raphael Tresor,
Thijs Jenneskens,
Ivana Nikoloska,
Raaja Ganapathy Subramanian,
Bart van Erp,
Dmitry Bagaev,
Albert Podusenko
Abstract:
We address the problem of planning under uncertainty, where an agent must choose actions that not only achieve desired outcomes but also reduce uncertainty. Traditional methods often treat exploration and exploitation as separate objectives, lacking a unified inferential foundation. Active inference, grounded in the Free Energy Principle, provides such a foundation by minimizing Expected Free Ener…
▽ More
We address the problem of planning under uncertainty, where an agent must choose actions that not only achieve desired outcomes but also reduce uncertainty. Traditional methods often treat exploration and exploitation as separate objectives, lacking a unified inferential foundation. Active inference, grounded in the Free Energy Principle, provides such a foundation by minimizing Expected Free Energy (EFE), a cost function that combines utility with epistemic drives, such as ambiguity resolution and novelty seeking. However, the computational burden of EFE minimization had remained a significant obstacle to its scalability. In this paper, we show that EFE-based planning arises naturally from minimizing a variational free energy functional on a generative model augmented with preference and epistemic priors. This result reinforces theoretical consistency with the Free Energy Principle by casting planning under uncertainty itself as a form of variational inference. Our formulation yields policies that jointly support goal achievement and information gain, while incorporating a complexity term that accounts for bounded computational resources. This unifying framework connects and extends existing methods, enabling scalable, resource-aware implementations of active inference agents.
△ Less
Submitted 23 April, 2025; v1 submitted 21 April, 2025;
originally announced April 2025.
-
FocusNet: Transformer-enhanced Polyp Segmentation with Local and Pooling Attention
Authors:
Jun Zeng,
KC Santosh,
Deepak Rajan Nayak,
Thomas de Lange,
Jonas Varkey,
Tyler Berzin,
Debesh Jha
Abstract:
Colonoscopy is vital in the early diagnosis of colorectal polyps. Regular screenings can effectively prevent benign polyps from progressing to CRC. While deep learning has made impressive strides in polyp segmentation, most existing models are trained on single-modality and single-center data, making them less effective in real-world clinical environments. To overcome these limitations, we propose…
▽ More
Colonoscopy is vital in the early diagnosis of colorectal polyps. Regular screenings can effectively prevent benign polyps from progressing to CRC. While deep learning has made impressive strides in polyp segmentation, most existing models are trained on single-modality and single-center data, making them less effective in real-world clinical environments. To overcome these limitations, we propose FocusNet, a Transformer-enhanced focus attention network designed to improve polyp segmentation. FocusNet incorporates three essential modules: the Cross-semantic Interaction Decoder Module (CIDM) for generating coarse segmentation maps, the Detail Enhancement Module (DEM) for refining shallow features, and the Focus Attention Module (FAM), to balance local detail and global context through local and pooling attention mechanisms. We evaluate our model on PolypDB, a newly introduced dataset with multi-modality and multi-center data for building more reliable segmentation methods. Extensive experiments showed that FocusNet consistently outperforms existing state-of-the-art approaches with a high dice coefficients of 82.47% on the BLI modality, 88.46% on FICE, 92.04% on LCI, 82.09% on the NBI and 93.42% on WLI modality, demonstrating its accuracy and robustness across five different modalities. The source code for FocusNet is available at https://github.com/JunZengz/FocusNet.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
Single-shot Star-convex Polygon-based Instance Segmentation for Spatially-correlated Biomedical Objects
Authors:
Trina De,
Adrian Urbanski,
Artur Yakimovich
Abstract:
Biomedical images often contain objects known to be spatially correlated or nested due to their inherent properties, leading to semantic relations. Examples include cell nuclei being nested within eukaryotic cells and colonies growing exclusively within their culture dishes. While these semantic relations bear key importance, detection tasks are often formulated independently, requiring multi-shot…
▽ More
Biomedical images often contain objects known to be spatially correlated or nested due to their inherent properties, leading to semantic relations. Examples include cell nuclei being nested within eukaryotic cells and colonies growing exclusively within their culture dishes. While these semantic relations bear key importance, detection tasks are often formulated independently, requiring multi-shot analysis pipelines. Importantly, spatial correlation could constitute a fundamental prior facilitating learning of more meaningful representations for tasks like instance segmentation. This knowledge has, thus far, not been utilised by the biomedical computer vision community. We argue that the instance segmentation of two or more categories of objects can be achieved in parallel. We achieve this via two architectures HydraStarDist (HSD) and the novel (HSD-WBR) based on the widely-used StarDist (SD), to take advantage of the star-convexity of our target objects. HSD and HSD-WBR are constructed to be capable of incorporating their interactions as constraints into account. HSD implicitly incorporates spatial correlation priors based on object interaction through a joint encoder. HSD-WBR further enforces the prior in a regularisation layer with the penalty we proposed named Within Boundary Regularisation Penalty (WBR). Both architectures achieve nested instance segmentation in a single shot. We demonstrate their competitiveness based on $IoU_R$ and AP and superiority in a new, task-relevant criteria, Joint TP rate (JTPR) compared to their baseline SD and Cellpose. Our approach can be further modified to capture partial-inclusion/-exclusion in multi-object interactions in fluorescent or brightfield microscopy or digital imaging. Finally, our strategy suggests gains by making this learning single-shot and computationally efficient.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
Frequency Hopping Waveform Design for Secure Integrated Sensing and Communications
Authors:
Ali Khandan Boroujeni,
Giuseppe Thadeu Freitas de Abreu,
Stefan Köpsell,
Ghazal Bagheri,
Kuranage Roche Rayan Ranasinghe,
Rafael F. Schaefer
Abstract:
We introduce a comprehensive approach to enhance the security, privacy, and sensing capabilities of integrated sensing and communications (ISAC) systems by leveraging random frequency agility (RFA) and random pulse repetition interval (PRI) agility (RPA) techniques. The combination of these techniques, which we refer to collectively as random frequency and PRI agility (RFPA), with channel reciproc…
▽ More
We introduce a comprehensive approach to enhance the security, privacy, and sensing capabilities of integrated sensing and communications (ISAC) systems by leveraging random frequency agility (RFA) and random pulse repetition interval (PRI) agility (RPA) techniques. The combination of these techniques, which we refer to collectively as random frequency and PRI agility (RFPA), with channel reciprocity-based key generation (CRKG) obfuscates both Doppler frequency and PRIs, significantly hindering the chances that passive adversaries can successfully estimate radar parameters. In addition, a hybrid information embedding method integrating amplitude shift keying (ASK), phase shift keying (PSK), index modulation (IM), and spatial modulation (SM) is incorporated to increase the achievable bit rate of the system significantly. Next, a sparse-matched filter receiver design is proposed to efficiently decode the embedded information with a low bit error rate (BER). Finally, a novel RFPA-based secret generation scheme using CRKG ensures secure code creation without a coordinating authority. The improved range and velocity estimation and reduced clutter effects achieved with the method are demonstrated via the evaluation of the ambiguity function (AF) of the proposed waveforms.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
On the Temporal Question-Answering Capabilities of Large Language Models Over Anonymized Data
Authors:
Alfredo Garrachón Ruiz,
Tomás de la Rosa,
Daniel Borrajo
Abstract:
The applicability of Large Language Models (LLMs) in temporal reasoning tasks over data that is not present during training is still a field that remains to be explored. In this paper we work on this topic, focusing on structured and semi-structured anonymized data. We not only develop a direct LLM pipeline, but also compare various methodologies and conduct an in-depth analysis. We identified and…
▽ More
The applicability of Large Language Models (LLMs) in temporal reasoning tasks over data that is not present during training is still a field that remains to be explored. In this paper we work on this topic, focusing on structured and semi-structured anonymized data. We not only develop a direct LLM pipeline, but also compare various methodologies and conduct an in-depth analysis. We identified and examined seventeen common temporal reasoning tasks in natural language, focusing on their algorithmic components. To assess LLM performance, we created the \textit{Reasoning and Answering Temporal Ability} dataset (RATA), featuring semi-structured anonymized data to ensure reliance on reasoning rather than on prior knowledge. We compared several methodologies, involving SoTA techniques such as Tree-of-Thought, self-reflexion and code execution, tuned specifically for this scenario. Our results suggest that achieving scalable and reliable solutions requires more than just standalone LLMs, highlighting the need for integrated approaches.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
Post-Quantum Wireless-based Key Encapsulation Mechanism via CRYSTALS-Kyber for Resource-Constrained Devices
Authors:
M. A. González de la Torre,
I. A. Morales Sandoval,
Giuseppe Thadeu Freitas de Abreu,
L. Hernández Encinas
Abstract:
We consider the problem of adapting a Post-Quantum cryptosystem to be used in resource-constrained devices, such as those typically used in Device-to-Device and Internet of Things systems. In particular, we propose leveraging the characteristics of wireless communications channels to minimize the complexity of implementation of a Post-Quantum public key encryption scheme, without diminishing its s…
▽ More
We consider the problem of adapting a Post-Quantum cryptosystem to be used in resource-constrained devices, such as those typically used in Device-to-Device and Internet of Things systems. In particular, we propose leveraging the characteristics of wireless communications channels to minimize the complexity of implementation of a Post-Quantum public key encryption scheme, without diminishing its security. To that end, we focus on the adaptation of a well-known cryptosystem, namely CRYSTALS-Kyber, so as to enable its direct integration into the lowest layer of the communication stack, the physical layer, defining two new transport schemes for CRYSTALS-Kyber to be used in Device-to-Device communications, both of which are modeled under a wireless channel subject to Additive White Gaussian Noise, using a 4 Quadrature Amplitude Modulation constellation and a BCH-code to communicate CRYSTALSKyber's polynomial coefficients. Simulation results demonstrate the viability of the adapted Kyber algorithm due to its low key error probability, while maintaining the security reductions of the original Kyber by considering the error distribution imposed by the channel on the cipher.
△ Less
Submitted 6 April, 2025;
originally announced April 2025.
-
negativas: a prototype for searching and classifying sentential negation in speech data
Authors:
Túlio Sousa de Gois,
Paloma Batista Cardoso
Abstract:
Negation is a universal feature of natural languages. In Brazilian Portuguese, the most commonly used negation particle is não, which can scope over nouns or verbs. When it scopes over a verb, não can occur in three positions: pre-verbal (NEG1), double negation (NEG2), or post-verbal (NEG3), e.g., não gosto, não gosto não, gosto não ("I do not like it"). From a variationist perspective, these stru…
▽ More
Negation is a universal feature of natural languages. In Brazilian Portuguese, the most commonly used negation particle is não, which can scope over nouns or verbs. When it scopes over a verb, não can occur in three positions: pre-verbal (NEG1), double negation (NEG2), or post-verbal (NEG3), e.g., não gosto, não gosto não, gosto não ("I do not like it"). From a variationist perspective, these structures are different forms of expressing negation. Pragmatically, they serve distinct communicative functions, such as politeness and modal evaluation. Despite their grammatical acceptability, these forms differ in frequency. NEG1 dominates across Brazilian regions, while NEG2 and NEG3 appear more rarely, suggesting its use is contextually restricted. This low-frequency challenges research, often resulting in subjective, non-generalizable interpretations of verbal negation with não. To address this, we developed negativas, a tool for automatically identifying NEG1, NEG2, and NEG3 in transcribed data. The tool's development involved four stages: i) analyzing a dataset of 22 interviews from the Falares Sergipanos database, annotated by three linguists, ii) creating a code using natural language processing (NLP) techniques, iii) running the tool, iv) evaluating accuracy. Inter-annotator consistency, measured using Fleiss' Kappa, was moderate (0.57). The tool identified 3,338 instances of não, classifying 2,085 as NEG1, NEG2, or NEG3, achieving a 93% success rate. However, negativas has limitations. NEG1 accounted for 91.5% of identified structures, while NEG2 and NEG3 represented 7.2% and 1.2%, respectively. The tool struggled with NEG2, sometimes misclassifying instances as overlapping structures (NEG1/NEG2/NEG3).
△ Less
Submitted 5 April, 2025;
originally announced April 2025.
-
What Large Language Models Do Not Talk About: An Empirical Study of Moderation and Censorship Practices
Authors:
Sander Noels,
Guillaume Bied,
Maarten Buyl,
Alexander Rogiers,
Yousra Fettach,
Jefrey Lijffijt,
Tijl De Bie
Abstract:
Large Language Models (LLMs) are increasingly deployed as gateways to information, yet their content moderation practices remain underexplored. This work investigates the extent to which LLMs refuse to answer or omit information when prompted on political topics. To do so, we distinguish between hard censorship (i.e., generated refusals, error messages, or canned denial responses) and soft censors…
▽ More
Large Language Models (LLMs) are increasingly deployed as gateways to information, yet their content moderation practices remain underexplored. This work investigates the extent to which LLMs refuse to answer or omit information when prompted on political topics. To do so, we distinguish between hard censorship (i.e., generated refusals, error messages, or canned denial responses) and soft censorship (i.e., selective omission or downplaying of key elements), which we identify in LLMs' responses when asked to provide information on a broad range of political figures. Our analysis covers 14 state-of-the-art models from Western countries, China, and Russia, prompted in all six official United Nations (UN) languages. Our analysis suggests that although censorship is observed across the board, it is predominantly tailored to an LLM provider's domestic audience and typically manifests as either hard censorship or soft censorship (though rarely both concurrently). These findings underscore the need for ideological and geographic diversity among publicly available LLMs, and greater transparency in LLM moderation strategies to facilitate informed user choices. All data are made freely available.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
A Robust Routing Protocol for 5G Mesh Networks
Authors:
Niclas Führling,
Ivan Alexander Morales Sandoval,
Giuseppe Thadeu Freitas de Abreu
Abstract:
We consider a novel routing protocol suitable for ad-hoc networks with dynamically changing topologies, such as DECT 2020 NR (NR+) systems, which often lead to missing links between the nodes and thus, incomplete or inefficient routes. A key point of the proposed protocol is the combination of network discovery and matrix completion techniques, which allow the nodes to establish communication path…
▽ More
We consider a novel routing protocol suitable for ad-hoc networks with dynamically changing topologies, such as DECT 2020 NR (NR+) systems, which often lead to missing links between the nodes and thus, incomplete or inefficient routes. A key point of the proposed protocol is the combination of network discovery and matrix completion techniques, which allow the nodes to establish communication paths efficiently and reliably. Additionally, multihop localization is performed to estimate the location of the nodes without needing to broadcast each node's geographical position, thus preserving privacy during the routing process and enabling nodes in the network to independently find potentially missing paths in a decentralized manner instead of flooding the whole network. Simulation results illustrate the good performance of the proposed technique in terms of the average number of hops of the obtained routes in different scenarios, with different network densities and amounts of incompleteness.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Parallel Minimum Cost Flow in Near-Linear Work and Square Root Depth for Dense Instances
Authors:
Jan van den Brand,
Hossein Gholizadeh,
Yonggang Jiang,
Tijn de Vos
Abstract:
For $n$-vertex $m$-edge graphs with integer polynomially-bounded costs and capacities, we provide a randomized parallel algorithm for the minimum cost flow problem with $\tilde O(m+n^ {1.5})$ work and $\tilde O(\sqrt{n})$ depth. On moderately dense graphs ($m>n^{1.5}$), our algorithm is the first one to achieve both near-linear work and sub-linear depth. Previous algorithms are either achieving al…
▽ More
For $n$-vertex $m$-edge graphs with integer polynomially-bounded costs and capacities, we provide a randomized parallel algorithm for the minimum cost flow problem with $\tilde O(m+n^ {1.5})$ work and $\tilde O(\sqrt{n})$ depth. On moderately dense graphs ($m>n^{1.5}$), our algorithm is the first one to achieve both near-linear work and sub-linear depth. Previous algorithms are either achieving almost optimal work but are highly sequential [Chen, Kyng, Liu, Peng, Gutenberg, Sachdev, FOCS'22], or achieving sub-linear depth but use super-linear work, [Lee, Sidford, FOCS'14], [Orlin, Stein, Oper. Res. Lett.'93]. Our result also leads to improvements for the special cases of max flow, bipartite maximum matching, shortest paths, and reachability. Notably, the previous algorithms achieving near-linear work for shortest paths and reachability all have depth $n^{o(1)}\cdot \sqrt{n}$ [Fischer, Haeupler, Latypov, Roeyskoe, Sulser, SOSA'25], [Liu, Jambulapati, Sidford, FOCS'19].
Our algorithm consists of a parallel implementation of [van den Brand, Lee, Liu, Saranurak, Sidford, Song, Wang, STOC'21]. One important building block is a \emph{dynamic} parallel expander decomposition, which we show how to obtain from the recent parallel expander decomposition of [Chen, Meierhans, Probst Gutenberh, Saranurak, SODA'25].
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
Authors:
Ahmed Nassar,
Andres Marafioti,
Matteo Omenetti,
Maksym Lysak,
Nikolaos Livathinos,
Christoph Auer,
Lucas Morin,
Rafael Teixeira de Lima,
Yusik Kim,
A. Said Gurbuz,
Michele Dolfi,
Miquel Farré,
Peter W. J. Staar
Abstract:
We introduce SmolDocling, an ultra-compact vision-language model targeting end-to-end document conversion. Our model comprehensively processes entire pages by generating DocTags, a new universal markup format that captures all page elements in their full context with location. Unlike existing approaches that rely on large foundational models, or ensemble solutions that rely on handcrafted pipeline…
▽ More
We introduce SmolDocling, an ultra-compact vision-language model targeting end-to-end document conversion. Our model comprehensively processes entire pages by generating DocTags, a new universal markup format that captures all page elements in their full context with location. Unlike existing approaches that rely on large foundational models, or ensemble solutions that rely on handcrafted pipelines of multiple specialized models, SmolDocling offers an end-to-end conversion for accurately capturing content, structure and spatial location of document elements in a 256M parameters vision-language model. SmolDocling exhibits robust performance in correctly reproducing document features such as code listings, tables, equations, charts, lists, and more across a diverse range of document types including business documents, academic papers, technical reports, patents, and forms -- significantly extending beyond the commonly observed focus on scientific papers. Additionally, we contribute novel publicly sourced datasets for charts, tables, equations, and code recognition. Experimental results demonstrate that SmolDocling competes with other Vision Language Models that are up to 27 times larger in size, while reducing computational requirements substantially. The model is currently available, datasets will be publicly available soon.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Group-robust Machine Unlearning
Authors:
Thomas De Min,
Subhankar Roy,
Stéphane Lathuilière,
Elisa Ricci,
Massimiliano Mancini
Abstract:
Machine unlearning is an emerging paradigm to remove the influence of specific training data (i.e., the forget set) from a model while preserving its knowledge of the rest of the data (i.e., the retain set). Previous approaches assume the forget data to be uniformly distributed from all training datapoints. However, if the data to unlearn is dominant in one group, we empirically show that performa…
▽ More
Machine unlearning is an emerging paradigm to remove the influence of specific training data (i.e., the forget set) from a model while preserving its knowledge of the rest of the data (i.e., the retain set). Previous approaches assume the forget data to be uniformly distributed from all training datapoints. However, if the data to unlearn is dominant in one group, we empirically show that performance for this group degrades, leading to fairness issues. This work tackles the overlooked problem of non-uniformly distributed forget sets, which we call group-robust machine unlearning, by presenting a simple, effective strategy that mitigates the performance loss in dominant groups via sample distribution reweighting. Moreover, we present MIU (Mutual Information-aware Machine Unlearning), the first approach for group robustness in approximate machine unlearning. MIU minimizes the mutual information between model features and group information, achieving unlearning while reducing performance degradation in the dominant group of the forget set. Additionally, MIU exploits sample distribution reweighting and mutual information calibration with the original model to preserve group robustness. We conduct experiments on three datasets and show that MIU outperforms standard methods, achieving unlearning without compromising model robustness. Source code available at https://github.com/tdemin16/group-robust_machine_unlearning.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
KAN-Mixers: a new deep learning architecture for image classification
Authors:
Jorge Luiz dos Santos Canuto,
Linnyer Beatrys Ruiz Aylon,
Rodrigo Clemente Thom de Souza
Abstract:
Due to their effective performance, Convolutional Neural Network (CNN) and Vision Transformer (ViT) architectures have become the standard for solving computer vision tasks. Such architectures require large data sets and rely on convolution and self-attention operations. In 2021, MLP-Mixer emerged, an architecture that relies only on Multilayer Perceptron (MLP) and achieves extremely competitive r…
▽ More
Due to their effective performance, Convolutional Neural Network (CNN) and Vision Transformer (ViT) architectures have become the standard for solving computer vision tasks. Such architectures require large data sets and rely on convolution and self-attention operations. In 2021, MLP-Mixer emerged, an architecture that relies only on Multilayer Perceptron (MLP) and achieves extremely competitive results when compared to CNNs and ViTs. Despite its good performance in computer vision tasks, the MLP-Mixer architecture may not be suitable for refined feature extraction in images. Recently, the Kolmogorov-Arnold Network (KAN) was proposed as a promising alternative to MLP models. KANs promise to improve accuracy and interpretability when compared to MLPs. Therefore, the present work aims to design a new mixer-based architecture, called KAN-Mixers, using KANs as main layers and evaluate its performance, in terms of several performance metrics, in the image classification task. As main results obtained, the KAN-Mixers model was superior to the MLP, MLP-Mixer and KAN models in the Fashion-MNIST and CIFAR-10 datasets, with 0.9030 and 0.6980 of average accuracy, respectively.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Biased Heritage: How Datasets Shape Models in Facial Expression Recognition
Authors:
Iris Dominguez-Catena,
Daniel Paternain,
Mikel Galar,
MaryBeth Defrance,
Maarten Buyl,
Tijl De Bie
Abstract:
In recent years, the rapid development of artificial intelligence (AI) systems has raised concerns about our ability to ensure their fairness, that is, how to avoid discrimination based on protected characteristics such as gender, race, or age. While algorithmic fairness is well-studied in simple binary classification tasks on tabular data, its application to complex, real-world scenarios-such as…
▽ More
In recent years, the rapid development of artificial intelligence (AI) systems has raised concerns about our ability to ensure their fairness, that is, how to avoid discrimination based on protected characteristics such as gender, race, or age. While algorithmic fairness is well-studied in simple binary classification tasks on tabular data, its application to complex, real-world scenarios-such as Facial Expression Recognition (FER)-remains underexplored. FER presents unique challenges: it is inherently multiclass, and biases emerge across intersecting demographic variables, each potentially comprising multiple protected groups. We present a comprehensive framework to analyze bias propagation from datasets to trained models in image-based FER systems, while introducing new bias metrics specifically designed for multiclass problems with multiple demographic groups. Our methodology studies bias propagation by (1) inducing controlled biases in FER datasets, (2) training models on these biased datasets, and (3) analyzing the correlation between dataset bias metrics and model fairness notions. Our findings reveal that stereotypical biases propagate more strongly to model predictions than representational biases, suggesting that preventing emotion-specific demographic patterns should be prioritized over general demographic balance in FER datasets. Additionally, we observe that biased datasets lead to reduced model accuracy, challenging the assumed fairness-accuracy trade-off.
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
Sensing Movement: Contemporary Dance Workshops with People who are Blind or have Low Vision and Dance Teachers
Authors:
Madhuka Thisuri De Silva,
Jim Smiley,
Sarah Goodwin,
Leona M Holloway,
Matthew Butler
Abstract:
Dance teachers rely primarily on verbal instructions and visual demonstrations to convey key dance concepts and movement. These techniques, however, have limitations in supporting students who are blind or have low vision (BLV). This work explores the role technology can play in supporting instruction for BLV students, as well as improvisation with their instructor. Through a series of design work…
▽ More
Dance teachers rely primarily on verbal instructions and visual demonstrations to convey key dance concepts and movement. These techniques, however, have limitations in supporting students who are blind or have low vision (BLV). This work explores the role technology can play in supporting instruction for BLV students, as well as improvisation with their instructor. Through a series of design workshops with dance instructors and BLV students, ideas were generated by physically engaging with probes featuring diverse modalities including tactile objects, a body tracked sound and musical probe, and a body tracked controller with vibrational feedback. Implications for the design of supporting technologies were discovered for four contemporary dance learning goals: learning a phrase; improvising; collaborating through movement; and awareness of body and movement qualities. We discuss the potential of numerous multi-sensory methods and artefacts, and present design considerations for technologies to support meaningful dance instruction and participation.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
A user-friendly SPARQL query editor powered by lightweight metadata
Authors:
Vincent Emonet,
Ana-Claudia Sima,
Tarcisio Mendes de Farias
Abstract:
SPARQL query editors often lack intuitive interfaces to aid SPARQL-savvy users to write queries. To address this issue, we propose an easy-to-deploy, triple store-agnostic and open-source query editor that offers three main features: (i) automatic query example rendering, (ii) precise autocomplete based on existing triple patterns including within SERVICE clauses, and (iii) a data-aware schema vis…
▽ More
SPARQL query editors often lack intuitive interfaces to aid SPARQL-savvy users to write queries. To address this issue, we propose an easy-to-deploy, triple store-agnostic and open-source query editor that offers three main features: (i) automatic query example rendering, (ii) precise autocomplete based on existing triple patterns including within SERVICE clauses, and (iii) a data-aware schema visualization. It can be easily set up with a custom HTML element. The tool has been successfully tested on various public endpoints, and is deployed online at https://sib-swiss.github.io/sparql-editor with open-source code available at https://github.com/sib-swiss/sparql-editor.
△ Less
Submitted 22 April, 2025; v1 submitted 4 March, 2025;
originally announced March 2025.
-
Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions
Authors:
Ruben T. Lucassen,
Sander P. J. Moonemans,
Tijn van de Luijtgaarden,
Gerben E. Breimer,
Willeke A. M. Blokx,
Mitko Veta
Abstract:
Millions of melanocytic skin lesions are examined by pathologists each year, the majority of which concern common nevi (i.e., ordinary moles). While most of these lesions can be diagnosed in seconds, writing the corresponding pathology report is much more time-consuming. Automating part of the report writing could, therefore, alleviate the increasing workload of pathologists. In this work, we deve…
▽ More
Millions of melanocytic skin lesions are examined by pathologists each year, the majority of which concern common nevi (i.e., ordinary moles). While most of these lesions can be diagnosed in seconds, writing the corresponding pathology report is much more time-consuming. Automating part of the report writing could, therefore, alleviate the increasing workload of pathologists. In this work, we develop a vision-language model specifically for the pathology domain of cutaneous melanocytic lesions. The model follows the Contrastive Captioner framework and was trained and evaluated using a melanocytic lesion dataset of 42,512 H&E-stained whole slide images and 19,645 corresponding pathology reports. Our results show that the quality scores of model-generated reports were on par with pathologist-written reports for common nevi, assessed by an expert pathologist in a reader study. While report generation revealed to be more difficult for rare melanocytic lesion subtypes, the cross-modal retrieval performance for these cases was considerably better.
△ Less
Submitted 27 February, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.
-
On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation
Authors:
Ruben T. Lucassen,
Tijn van de Luijtgaarden,
Sander P. J. Moonemans,
Gerben E. Breimer,
Willeke A. M. Blokx,
Mitko Veta
Abstract:
Vision-language models in pathology enable multimodal case retrieval and automated report generation. Many of the models developed so far, however, have been trained on pathology reports that include information which cannot be inferred from paired whole slide images (e.g., patient history), potentially leading to hallucinated sentences in generated reports. To this end, we investigate how the sel…
▽ More
Vision-language models in pathology enable multimodal case retrieval and automated report generation. Many of the models developed so far, however, have been trained on pathology reports that include information which cannot be inferred from paired whole slide images (e.g., patient history), potentially leading to hallucinated sentences in generated reports. To this end, we investigate how the selection of information from pathology reports for vision-language modeling affects the quality of the multimodal representations and generated reports. More concretely, we compare a model trained on full reports against a model trained on preprocessed reports that only include sentences describing the cell and tissue appearances based on the H&E-stained slides. For the experiments, we built upon the BLIP-2 framework and used a cutaneous melanocytic lesion dataset of 42,433 H&E-stained whole slide images and 19,636 corresponding pathology reports. Model performance was assessed using image-to-text and text-to-image retrieval, as well as qualitative evaluation of the generated reports by an expert pathologist. Our results demonstrate that text preprocessing prevents hallucination in report generation. Despite the improvement in the quality of the generated reports, training the vision-language model on full reports showed better cross-modal retrieval performance.
△ Less
Submitted 6 June, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.
-
Bridging Gaps in Natural Language Processing for Yorùbá: A Systematic Review of a Decade of Progress and Prospects
Authors:
Toheeb A. Jimoh,
Tabea De Wille,
Nikola S. Nikolov
Abstract:
Natural Language Processing (NLP) is becoming a dominant subset of artificial intelligence as the need to help machines understand human language looks indispensable. Several NLP applications are ubiquitous, partly due to the myriads of datasets being churned out daily through mediums like social networking sites. However, the growing development has not been evident in most African languages due…
▽ More
Natural Language Processing (NLP) is becoming a dominant subset of artificial intelligence as the need to help machines understand human language looks indispensable. Several NLP applications are ubiquitous, partly due to the myriads of datasets being churned out daily through mediums like social networking sites. However, the growing development has not been evident in most African languages due to the persisting resource limitation, among other issues. Yorùbá language, a tonal and morphologically rich African language, suffers a similar fate, resulting in limited NLP usage. To encourage further research towards improving this situation, this systematic literature review aims to comprehensively analyse studies addressing NLP development for Yorùbá, identifying challenges, resources, techniques, and applications. A well-defined search string from a structured protocol was employed to search, select, and analyse 105 primary studies between 2014 and 2024 from reputable databases. The review highlights the scarcity of annotated corpora, limited availability of pre-trained language models, and linguistic challenges like tonal complexity and diacritic dependency as significant obstacles. It also revealed the prominent techniques, including rule-based methods, among others. The findings reveal a growing body of multilingual and monolingual resources, even though the field is constrained by socio-cultural factors such as code-switching and desertion of language for digital usage. This review synthesises existing research, providing a foundation for advancing NLP for Yorùbá and in African languages generally. It aims to guide future research by identifying gaps and opportunities, thereby contributing to the broader inclusion of Yorùbá and other under-resourced African languages in global NLP advancements.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
A Parameterized Complexity Analysis of Bounded Height Depth-first Search Trees
Authors:
Lars Jaffke,
Paloma T. de Lima,
Wojciech Nadara,
Emmanuel Sam
Abstract:
Computing bounded depth decompositions is a bottleneck in many applications of the treedepth parameter. The fastest known algorithm, which is due to Reidl, Rossmanith, Sánchez Villaamil, and Sikdar [ICALP 2014], runs in $2^{\mathcal{O}(k^2)}\cdot n$ time and it is a big open problem whether the dependency on $k$ can be improved to $2^{o(k^2)}\cdot n^{\mathcal{O}(1)}$. We show that the related prob…
▽ More
Computing bounded depth decompositions is a bottleneck in many applications of the treedepth parameter. The fastest known algorithm, which is due to Reidl, Rossmanith, Sánchez Villaamil, and Sikdar [ICALP 2014], runs in $2^{\mathcal{O}(k^2)}\cdot n$ time and it is a big open problem whether the dependency on $k$ can be improved to $2^{o(k^2)}\cdot n^{\mathcal{O}(1)}$. We show that the related problem of finding DFS trees of bounded height can be solved faster in $2^{\mathcal{O}(k \log k)}\cdot n$ time. As DFS trees are treedepth decompositions, this circumvents the above mentioned bottleneck for this subclass of graphs of bounded treedepth. This problem has recently found attention independently under the name Minimum Height Lineal Topology (MinHLT) and our algorithm gives a positive answer to an open problem posed by Golovach [Dagstuhl Reports, 2023]. We complement our main result by studying the complexity of MinHLT and related problems in several other settings. First, we show that it remains NP-complete on chordal graphs, and give an FPT-algorithm on chordal graphs for the dual problem, asking for a DFS tree of height at most $n-k$, parameterized by $k$. The parameterized complexity of Dual MinHLT on general graphs is wide open. Lastly, we show that Dual MinHLT and two other problems concerned with finding DFS trees with few or many leaves are FPT parameterized by $k$ plus the treewidth of the input graph.
△ Less
Submitted 23 February, 2025;
originally announced February 2025.
-
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Authors:
Granite Vision Team,
Leonid Karlinsky,
Assaf Arbelle,
Abraham Daniels,
Ahmed Nassar,
Amit Alfassi,
Bo Wu,
Eli Schwartz,
Dhiraj Joshi,
Jovana Kondic,
Nimrod Shabtay,
Pengyuan Li,
Roei Herzig,
Shafiq Abedin,
Shaked Perek,
Sivan Harary,
Udi Barzelay,
Adi Raz Goldfarb,
Aude Oliva,
Ben Wieles,
Bishwaranjan Bhattacharjee,
Brandon Huang,
Christoph Auer,
Dan Gutfreund,
David Beymer
, et al. (38 additional authors not shown)
Abstract:
We introduce Granite Vision, a lightweight large language model with vision capabilities, specifically designed to excel in enterprise use cases, particularly in visual document understanding. Our model is trained on a comprehensive instruction-following dataset, including document-related tasks, such as content extraction from tables, charts, diagrams, sketches, and infographics, as well as gener…
▽ More
We introduce Granite Vision, a lightweight large language model with vision capabilities, specifically designed to excel in enterprise use cases, particularly in visual document understanding. Our model is trained on a comprehensive instruction-following dataset, including document-related tasks, such as content extraction from tables, charts, diagrams, sketches, and infographics, as well as general image tasks. The architecture of Granite Vision is centered around visual modality alignment with a decoder-only, 2 billion parameter Granite large language model. Additionally, we introduce a dedicated safety classification approach in test-time that leverages a sparse set of attention vectors to identify potential harmful inputs. Despite its lightweight architecture, Granite Vision achieves strong results in standard benchmarks related to visual document understanding, as well as on the LiveXiv benchmark, which is designed to avoid test set contamination by using a constantly updated corpus of recently published Arxiv papers. We are releasing the model under the Apache-2 license, allowing for both research and commercial use, while offering complete visibility into the training data and other relevant details. See https://huggingface.co/ibm-granite/ for model weights.
△ Less
Submitted 14 February, 2025;
originally announced February 2025.
-
Harnessing omnipresent oscillator networks as computational resource
Authors:
Thomas Geert de Jong,
Hirofumi Notsu,
Kohei Nakajima
Abstract:
Nature is pervaded with oscillatory behavior. In networks of coupled oscillators patterns can arise when the system synchronizes to an external input. Hence, these networks provide processing and memory of input. We present a universal framework for harnessing oscillator networks as computational resource. This reservoir computing framework is introduced by the ubiquitous model for phase-locking,…
▽ More
Nature is pervaded with oscillatory behavior. In networks of coupled oscillators patterns can arise when the system synchronizes to an external input. Hence, these networks provide processing and memory of input. We present a universal framework for harnessing oscillator networks as computational resource. This reservoir computing framework is introduced by the ubiquitous model for phase-locking, the Kuramoto model. We force the Kuramoto model by a nonlinear target-system, then after substituting the target-system with a trained feedback-loop it emulates the target-system. Our results are two-fold. Firstly, the trained network inherits performance properties of the Kuramoto model, where all-to-all coupling is performed in linear time with respect to the number of nodes and parameters for synchronization are abundant. Secondly, the learning capabilities of the oscillator network can be explained using Kuramoto model's order parameter. This work provides the foundation for utilizing nature's oscillator networks as a new class of information processing systems.
△ Less
Submitted 21 February, 2025; v1 submitted 7 February, 2025;
originally announced February 2025.
-
FPGA Innovation Research in the Netherlands: Present Landscape and Future Outlook
Authors:
Nikolaos Alachiotis,
Sjoerd van den Belt,
Steven van der Vlugt,
Reinier van der Walle,
Mohsen Safari,
Bruno Endres Forlin,
Tiziano De Matteis,
Zaid Al-Ars,
Roel Jordans,
António J. Sousa de Almeida,
Federico Corradi,
Christiaan Baaij,
Ana-Lucia Varbanescu
Abstract:
FPGAs have transformed digital design by enabling versatile and customizable solutions that balance performance and power efficiency, yielding them essential for today's diverse computing challenges. Research in the Netherlands, both in academia and industry, plays a major role in developing new innovative FPGA solutions. This survey presents the current landscape of FPGA innovation research in th…
▽ More
FPGAs have transformed digital design by enabling versatile and customizable solutions that balance performance and power efficiency, yielding them essential for today's diverse computing challenges. Research in the Netherlands, both in academia and industry, plays a major role in developing new innovative FPGA solutions. This survey presents the current landscape of FPGA innovation research in the Netherlands by delving into ongoing projects, advancements, and breakthroughs in the field. Focusing on recent research outcome (within the past 5 years), we have identified five key research areas: a) FPGA architecture, b) FPGA robustness, c) data center infrastructure and high-performance computing, d) programming models and tools, and e) applications. This survey provides in-depth insights beyond a mere snapshot of the current innovation research landscape by highlighting future research directions within each key area; these insights can serve as a foundational resource to inform potential national-level investments in FPGA technology.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Entropy-based measure of rock sample heterogeneity derived from micro-CT images
Authors:
Luan Coelho Vieira Silva,
Júlio de Castro Vargas Fernandes,
Felipe Belilaqua Foldes Guimarães,
Pedro Henrique Braga Lisboa,
Carlos Eduardo Menezes dos Anjos,
Thais Fernandes de Matos,
Marcelo Ramalho Albuquerque,
Rodrigo Surmas,
Alexandre Gonçalves Evsukoff
Abstract:
This study presents an automated method for objectively measuring rock heterogeneity via raw X-ray micro-computed tomography (micro-CT) images, thereby addressing the limitations of traditional methods, which are time-consuming, costly, and subjective. Unlike approaches that rely on image segmentation, the proposed method processes micro-CT images directly, identifying textural heterogeneity. The…
▽ More
This study presents an automated method for objectively measuring rock heterogeneity via raw X-ray micro-computed tomography (micro-CT) images, thereby addressing the limitations of traditional methods, which are time-consuming, costly, and subjective. Unlike approaches that rely on image segmentation, the proposed method processes micro-CT images directly, identifying textural heterogeneity. The image is partitioned into subvolumes, where attributes are calculated for each one, with entropy serving as a measure of uncertainty. This method adapts to varying sample characteristics and enables meaningful comparisons across distinct sets of samples. It was applied to a dataset consisting of 4,935 images of cylindrical plug samples derived from Brazilian reservoirs. The results showed that the selected attributes play a key role in producing desirable outcomes, such as strong correlations with structural heterogeneity. To assess the effectiveness of our method, we used evaluations provided by four experts who classified 175 samples as either heterogeneous or homogeneous, where each expert assessed a different number of samples. One of the presented attributes demonstrated a statistically significant difference between the homogeneous and heterogeneous samples labelled by all the experts, whereas the other two attributes yielded nonsignificant differences for three out of the four experts. The method was shown to better align with the expert choices than traditional textural attributes known for extracting heterogeneous properties from images. This textural heterogeneity measure provides an additional parameter that can assist in rock characterization, and the automated approach ensures easy reproduction and high cost-effectiveness.
△ Less
Submitted 1 February, 2025;
originally announced February 2025.
-
Wrapped Gaussian on the manifold of Symmetric Positive Definite Matrices
Authors:
Thibault de Surrel,
Fabien Lotte,
Sylvain Chevallier,
Florian Yger
Abstract:
Circular and non-flat data distributions are prevalent across diverse domains of data science, yet their specific geometric structures often remain underutilized in machine learning frameworks. A principled approach to accounting for the underlying geometry of such data is pivotal, particularly when extending statistical models, like the pervasive Gaussian distribution. In this work, we tackle tho…
▽ More
Circular and non-flat data distributions are prevalent across diverse domains of data science, yet their specific geometric structures often remain underutilized in machine learning frameworks. A principled approach to accounting for the underlying geometry of such data is pivotal, particularly when extending statistical models, like the pervasive Gaussian distribution. In this work, we tackle those issue by focusing on the manifold of symmetric positive definite (SPD) matrices, a key focus in information geometry. We introduce a non-isotropic wrapped Gaussian by leveraging the exponential map, we derive theoretical properties of this distribution and propose a maximum likelihood framework for parameter estimation. Furthermore, we reinterpret established classifiers on SPD through a probabilistic lens and introduce new classifiers based on the wrapped Gaussian model. Experiments on synthetic and real-world datasets demonstrate the robustness and flexibility of this geometry-aware distribution, underscoring its potential to advance manifold-based data analysis. This work lays the groundwork for extending classical machine learning and statistical methods to more complex and structured data.
△ Less
Submitted 27 May, 2025; v1 submitted 3 February, 2025;
originally announced February 2025.
-
How Does Microservice Granularity Impact Energy Consumption and Performance? A Controlled Experiment
Authors:
Yiming Zhao,
Tiziano De Matteis,
Justus Bogner
Abstract:
Context: Microservice architectures are a widely used software deployment approach, with benefits regarding flexibility and scalability. However, their impact on energy consumption is poorly understood, and often overlooked in favor of performance and other quality attributes (QAs). One understudied concept in this area is microservice granularity, i.e., over how many services the system functiona…
▽ More
Context: Microservice architectures are a widely used software deployment approach, with benefits regarding flexibility and scalability. However, their impact on energy consumption is poorly understood, and often overlooked in favor of performance and other quality attributes (QAs). One understudied concept in this area is microservice granularity, i.e., over how many services the system functionality is distributed.
Objective: We therefore aim to analyze the relationship between microservice granularity and two critical QAs in microservice-based systems: energy consumption and performance.
Method: We conducted a controlled experiment using two open-source microservice-based systems of different scales: the small Pet Clinic system and the large Train Ticket system. For each system, we created three levels of granularity by merging or splitting services (coarse, medium, and fine) and then exposed them to five levels of request frequency.
Results: Our findings revealed that: i) granularity significantly affected both energy consumption and response time, e.g., in the large system, fine granularity consumed on average 461 J more energy (13%) and added 5.2 ms to response time (14%) compared to coarse granularity; ii) higher request loads significantly increased both energy consumption and response times, with moving from 40 to 400 requests / s resulting in 651 J higher energy consumption (23%) and 41.2 ms longer response times (98%); iii) there is a complex relationship between granularity, system scale, energy consumption, and performance that warrants careful consideration in microservice design. We derive generalizable takeaways from our results.
Conclusion: Microservices practitioners should take our findings into account when making granularity-related decisions, especially for large-scale systems.
△ Less
Submitted 1 February, 2025;
originally announced February 2025.
-
Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion
Authors:
Nikolaos Livathinos,
Christoph Auer,
Maksym Lysak,
Ahmed Nassar,
Michele Dolfi,
Panos Vagenas,
Cesar Berrospi Ramis,
Matteo Omenetti,
Kasper Dinkla,
Yusik Kim,
Shubham Gupta,
Rafael Teixeira de Lima,
Valery Weber,
Lucas Morin,
Ingmar Meijer,
Viktor Kuropiatnyk,
Peter W. J. Staar
Abstract:
We introduce Docling, an easy-to-use, self-contained, MIT-licensed, open-source toolkit for document conversion, that can parse several types of popular document formats into a unified, richly structured representation. It is powered by state-of-the-art specialized AI models for layout analysis (DocLayNet) and table structure recognition (TableFormer), and runs efficiently on commodity hardware in…
▽ More
We introduce Docling, an easy-to-use, self-contained, MIT-licensed, open-source toolkit for document conversion, that can parse several types of popular document formats into a unified, richly structured representation. It is powered by state-of-the-art specialized AI models for layout analysis (DocLayNet) and table structure recognition (TableFormer), and runs efficiently on commodity hardware in a small resource budget. Docling is released as a Python package and can be used as a Python API or as a CLI tool. Docling's modular architecture and efficient document representation make it easy to implement extensions, new features, models, and customizations. Docling has been already integrated in other popular open-source frameworks (e.g., LangChain, LlamaIndex, spaCy), making it a natural fit for the processing of documents and the development of high-end applications. The open-source community has fully engaged in using, promoting, and developing for Docling, which gathered 10k stars on GitHub in less than a month and was reported as the No. 1 trending repository in GitHub worldwide in November 2024.
△ Less
Submitted 27 January, 2025;
originally announced January 2025.
-
Ordinal Exponentiation in Homotopy Type Theory
Authors:
Tom de Jong,
Nicolai Kraus,
Fredrik Nordvall Forsberg,
Chuangjie Xu
Abstract:
We present two seemingly different definitions of constructive ordinal exponentiation, where an ordinal is taken to be a transitive, extensional, and wellfounded order on a set. The first definition is abstract, uses suprema of ordinals, and is solely motivated by the expected equations. The second is more concrete, based on decreasing lists, and can be seen as a constructive version of a classica…
▽ More
We present two seemingly different definitions of constructive ordinal exponentiation, where an ordinal is taken to be a transitive, extensional, and wellfounded order on a set. The first definition is abstract, uses suprema of ordinals, and is solely motivated by the expected equations. The second is more concrete, based on decreasing lists, and can be seen as a constructive version of a classical construction by Sierpi{ń}ski based on functions with finite support. We show that our two approaches are equivalent (whenever it makes sense to ask the question), and use this equivalence to prove algebraic laws and decidability properties of the exponential. Our work takes place in the framework of homotopy type theory, and all results are formalized in the proof assistant Agda.
△ Less
Submitted 20 May, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
Robust Egoistic Rigid Body Localization
Authors:
Niclas Führling,
Giuseppe Thadeu Freitas de Abreu,
David González G.,
Osvaldo Gonsa
Abstract:
We consider a robust and self-reliant (or "egoistic") variation of the rigid body localization (RBL) problem, in which a primary rigid body seeks to estimate the pose (i.e., location and orientation) of another rigid body (or "target"), relative to its own, without the assistance of external infrastructure, without prior knowledge of the shape of the target, and taking into account the possibility…
▽ More
We consider a robust and self-reliant (or "egoistic") variation of the rigid body localization (RBL) problem, in which a primary rigid body seeks to estimate the pose (i.e., location and orientation) of another rigid body (or "target"), relative to its own, without the assistance of external infrastructure, without prior knowledge of the shape of the target, and taking into account the possibility that the available observations are incomplete. Three complementary contributions are then offered for such a scenario. The first is a method to estimate the translation vector between the center point of both rigid bodies, which unlike existing techniques does not require that both objects have the same shape or even the same number of landmark points. This technique is shown to significantly outperform the state-of-the-art (SotA) under complete information, but to be sensitive to data erasures, even when enhanced by matrix completion methods. The second contribution, designed to offer improved performance in the presence of incomplete information, offers a robust alternative to the latter, at the expense of a slight relative loss under complete information. Finally, the third contribution is a scheme for the estimation of the rotation matrix describing the relative orientation of the target rigid body with respect to the primary. Comparisons of the proposed schemes and SotA techniques demonstrate the advantage of the contributed methods in terms of root mean square error (RMSE) performance under fully complete information and incomplete conditions.
△ Less
Submitted 17 January, 2025;
originally announced January 2025.
-
A Computationally Grounded Framework for Cognitive Attitudes (extended version)
Authors:
Tiago de Lima,
Emiliano Lorini,
Elise Perrotin,
François Schwarzentruber
Abstract:
We introduce a novel language for reasoning about agents' cognitive attitudes of both epistemic and motivational type. We interpret it by means of a computationally grounded semantics using belief bases. Our language includes five types of modal operators for implicit belief, complete attraction, complete repulsion, realistic attraction and realistic repulsion. We give an axiomatization and show t…
▽ More
We introduce a novel language for reasoning about agents' cognitive attitudes of both epistemic and motivational type. We interpret it by means of a computationally grounded semantics using belief bases. Our language includes five types of modal operators for implicit belief, complete attraction, complete repulsion, realistic attraction and realistic repulsion. We give an axiomatization and show that our operators are not mutually expressible and that they can be combined to represent a large variety of psychological concepts including ambivalence, indifference, being motivated, being demotivated and preference. We present a dynamic extension of the language that supports reasoning about the effects of belief change operations. Finally, we provide a succinct formulation of model checking for our languages and a PSPACE model checking algorithm relying on a reduction into TQBF. We present some experimental results for the implemented algorithm on computation time in a concrete example.
△ Less
Submitted 18 December, 2024;
originally announced December 2024.
-
THÖR-MAGNI Act: Actions for Human Motion Modeling in Robot-Shared Industrial Spaces
Authors:
Tiago Rodrigues de Almeida,
Tim Schreiter,
Andrey Rudenko,
Luigi Palmieiri,
Johannes A. Stork,
Achim J. Lilienthal
Abstract:
Accurate human activity and trajectory prediction are crucial for ensuring safe and reliable human-robot interactions in dynamic environments, such as industrial settings, with mobile robots. Datasets with fine-grained action labels for moving people in industrial environments with mobile robots are scarce, as most existing datasets focus on social navigation in public spaces. This paper introduce…
▽ More
Accurate human activity and trajectory prediction are crucial for ensuring safe and reliable human-robot interactions in dynamic environments, such as industrial settings, with mobile robots. Datasets with fine-grained action labels for moving people in industrial environments with mobile robots are scarce, as most existing datasets focus on social navigation in public spaces. This paper introduces the THÖR-MAGNI Act dataset, a substantial extension of the THÖR-MAGNI dataset, which captures participant movements alongside robots in diverse semantic and spatial contexts. THÖR-MAGNI Act provides 8.3 hours of manually labeled participant actions derived from egocentric videos recorded via eye-tracking glasses. These actions, aligned with the provided THÖR-MAGNI motion cues, follow a long-tailed distribution with diversified acceleration, velocity, and navigation distance profiles. We demonstrate the utility of THÖR-MAGNI Act for two tasks: action-conditioned trajectory prediction and joint action and trajectory prediction. We propose two efficient transformer-based models that outperform the baselines to address these tasks. These results underscore the potential of THÖR-MAGNI Act to develop predictive models for enhanced human-robot interaction in complex environments.
△ Less
Submitted 23 December, 2024; v1 submitted 18 December, 2024;
originally announced December 2024.
-
Your Next State-of-the-Art Could Come from Another Domain: A Cross-Domain Analysis of Hierarchical Text Classification
Authors:
Nan Li,
Bo Kang,
Tijl De Bie
Abstract:
Text classification with hierarchical labels is a prevalent and challenging task in natural language processing. Examples include assigning ICD codes to patient records, tagging patents into IPC classes, assigning EUROVOC descriptors to European legal texts, and more. Despite its widespread applications, a comprehensive understanding of state-of-the-art methods across different domains has been la…
▽ More
Text classification with hierarchical labels is a prevalent and challenging task in natural language processing. Examples include assigning ICD codes to patient records, tagging patents into IPC classes, assigning EUROVOC descriptors to European legal texts, and more. Despite its widespread applications, a comprehensive understanding of state-of-the-art methods across different domains has been lacking. In this paper, we provide the first comprehensive cross-domain overview with empirical analysis of state-of-the-art methods. We propose a unified framework that positions each method within a common structure to facilitate research. Our empirical analysis yields key insights and guidelines, confirming the necessity of learning across different research areas to design effective methods. Notably, under our unified evaluation pipeline, we achieved new state-of-the-art results by applying techniques beyond their original domains.
△ Less
Submitted 17 December, 2024;
originally announced December 2024.
-
TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation
Authors:
Alfredo Garrachón Ruiz,
Tomás de la Rosa,
Daniel Borrajo
Abstract:
The inference cost of Large Language Models (LLMs) is a significant challenge due to their computational demands, specially on tasks requiring long outputs. However, natural language often contains redundancy, which presents an opportunity for optimization. We have observed that LLMs can generate distilled language-concise outputs that retain essential meaning, when prompted appropriately. We prop…
▽ More
The inference cost of Large Language Models (LLMs) is a significant challenge due to their computational demands, specially on tasks requiring long outputs. However, natural language often contains redundancy, which presents an opportunity for optimization. We have observed that LLMs can generate distilled language-concise outputs that retain essential meaning, when prompted appropriately. We propose TRIM, a pipeline for saving computational cost in which a shorter distilled output from the LLM is reconstructed into a full narrative by a smaller model with lower inference costs. Our experiments show promising results, particularly in general knowledge domains with 20.58% saved tokens on average with tiny decrease in evaluation metrics, hinting that this approach can effectively balance efficiency and accuracy in language processing tasks.
△ Less
Submitted 18 December, 2024; v1 submitted 10 December, 2024;
originally announced December 2024.
-
The BrowserGym Ecosystem for Web Agent Research
Authors:
Thibault Le Sellier De Chezelles,
Maxime Gasse,
Alexandre Drouin,
Massimo Caccia,
Léo Boisvert,
Megh Thakkar,
Tom Marty,
Rim Assouel,
Sahar Omidi Shayegan,
Lawrence Keunho Jang,
Xing Han Lù,
Ori Yoran,
Dehan Kong,
Frank F. Xu,
Siva Reddy,
Quentin Cappart,
Graham Neubig,
Ruslan Salakhutdinov,
Nicolas Chapados,
Alexandre Lacoste
Abstract:
The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those leveraging automation and Large Language Models (LLMs). Many existing benchmarks suffer from fragmentation and inconsistent evaluation methodologies, making it challenging to achieve reliable comparisons and reproducible results. In an earlier work, Drouin et al. (2024) i…
▽ More
The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those leveraging automation and Large Language Models (LLMs). Many existing benchmarks suffer from fragmentation and inconsistent evaluation methodologies, making it challenging to achieve reliable comparisons and reproducible results. In an earlier work, Drouin et al. (2024) introduced BrowserGym which aims to solve this by providing a unified, gym-like environment with well-defined observation and action spaces, facilitating standardized evaluation across diverse benchmarks. We propose an extended BrowserGym-based ecosystem for web agent research, which unifies existing benchmarks from the literature and includes AgentLab, a complementary framework that aids in agent creation, testing, and analysis. Our proposed ecosystem offers flexibility for integrating new benchmarks while ensuring consistent evaluation and comprehensive experiment management. As a supporting evidence, we conduct the first large-scale, multi-benchmark web agent experiment and compare the performance of 6 state-of-the-art LLMs across 6 popular web agent benchmarks made available in BrowserGym. Among other findings, our results highlight a large discrepancy between OpenAI and Anthropic's latests models, with Claude-3.5-Sonnet leading the way on almost all benchmarks, except on vision-related tasks where GPT-4o is superior. Despite these advancements, our results emphasize that building robust and efficient web agents remains a significant challenge, due to the inherent complexity of real-world web environments and the limitations of current models.
△ Less
Submitted 28 February, 2025; v1 submitted 6 December, 2024;
originally announced December 2024.
-
Know Your RAG: Dataset Taxonomy and Generation Strategies for Evaluating RAG Systems
Authors:
Rafael Teixeira de Lima,
Shubham Gupta,
Cesar Berrospi,
Lokesh Mishra,
Michele Dolfi,
Peter Staar,
Panagiotis Vagenas
Abstract:
Retrieval Augmented Generation (RAG) systems are a widespread application of Large Language Models (LLMs) in the industry. While many tools exist empowering developers to build their own systems, measuring their performance locally, with datasets reflective of the system's use cases, is a technological challenge. Solutions to this problem range from non-specific and cheap (most public datasets) to…
▽ More
Retrieval Augmented Generation (RAG) systems are a widespread application of Large Language Models (LLMs) in the industry. While many tools exist empowering developers to build their own systems, measuring their performance locally, with datasets reflective of the system's use cases, is a technological challenge. Solutions to this problem range from non-specific and cheap (most public datasets) to specific and costly (generating data from local documents). In this paper, we show that using public question and answer (Q&A) datasets to assess retrieval performance can lead to non-optimal systems design, and that common tools for RAG dataset generation can lead to unbalanced data. We propose solutions to these issues based on the characterization of RAG datasets through labels and through label-targeted data generation. Finally, we show that fine-tuned small LLMs can efficiently generate Q&A datasets. We believe that these observations are invaluable to the know-your-data step of RAG systems development.
△ Less
Submitted 29 November, 2024;
originally announced November 2024.
-
Linear Realisability over nets: multiplicatives (long version)
Authors:
Adrien Ragot,
Thomas Seiller,
Lorenzo Tortora de Falco
Abstract:
We provide a new realisability model based on orthogonality for the multiplicative fragment of linear logic, both in presence of generalised axioms (MLL*) and in the standard case (MLL). The novelty is the definition of cut elimination for generalised axioms. We prove that our model is adequate and complete both for MLL* and MLL.
We provide a new realisability model based on orthogonality for the multiplicative fragment of linear logic, both in presence of generalised axioms (MLL*) and in the standard case (MLL). The novelty is the definition of cut elimination for generalised axioms. We prove that our model is adequate and complete both for MLL* and MLL.
△ Less
Submitted 26 November, 2024;
originally announced November 2024.
-
Ética para LLMs: o compartilhamento de dados sociolinguísticos
Authors:
Marta Deysiane Alves Faria Sousa,
Raquel Meister Ko. Freitag,
Túlio Sousa de Gois
Abstract:
The collection of speech data carried out in Sociolinguistics has the potential to enhance large language models due to its quality and representativeness. In this paper, we examine the ethical considerations associated with the gathering and dissemination of such data. Additionally, we outline strategies for addressing the sensitivity of speech data, as it may facilitate the identification of inf…
▽ More
The collection of speech data carried out in Sociolinguistics has the potential to enhance large language models due to its quality and representativeness. In this paper, we examine the ethical considerations associated with the gathering and dissemination of such data. Additionally, we outline strategies for addressing the sensitivity of speech data, as it may facilitate the identification of informants who contributed with their speech.
△ Less
Submitted 11 November, 2024;
originally announced November 2024.
-
Persuasion with Large Language Models: a Survey
Authors:
Alexander Rogiers,
Sander Noels,
Maarten Buyl,
Tijl De Bie
Abstract:
The rapid rise of Large Language Models (LLMs) has created new disruptive possibilities for persuasive communication, by enabling fully-automated personalized and interactive content generation at an unprecedented scale. In this paper, we survey the research field of LLM-based persuasion that has emerged as a result. We begin by exploring the different modes in which LLM Systems are used to influe…
▽ More
The rapid rise of Large Language Models (LLMs) has created new disruptive possibilities for persuasive communication, by enabling fully-automated personalized and interactive content generation at an unprecedented scale. In this paper, we survey the research field of LLM-based persuasion that has emerged as a result. We begin by exploring the different modes in which LLM Systems are used to influence human attitudes and behaviors. In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness. We identify key factors influencing their effectiveness, such as the manner of personalization and whether the content is labelled as AI-generated. We also summarize the experimental designs that have been used to evaluate progress. Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks, including the spread of misinformation, the magnification of biases, and the invasion of privacy. These risks underscore the urgent need for ethical guidelines and updated regulatory frameworks to avoid the widespread deployment of irresponsible and harmful LLM Systems.
△ Less
Submitted 11 November, 2024;
originally announced November 2024.