Skip to main content

Showing 1–2 of 2 results for author: Gaere, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.16155  [pdf, other

    cs.NE

    DATETIME: A new benchmark to measure LLM translation and reasoning capabilities

    Authors: Edward Gaere, Florian Wangenheim

    Abstract: This paper introduces DATETIME, a new high-quality benchmark designed to evaluate the translation and reasoning abilities of a Large Language Model (LLM) on datetimes. A datetime is simply a date and a time, for example '11th.february.2023 ,1:12:31'. Datetimes are an interesting domain because they are intuitive and straightforward for humans to process but present significant challenges for LLMs.… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  2. A Self-Integration Testbed for Decentralized Socio-technical Systems

    Authors: Farzam Fanitabasi, Edward Gaere, Evangelos Pournaras

    Abstract: The Internet of Things comes along with new challenges for experimenting, testing, and operating decentralized socio-technical systems at large-scale. In such systems, autonomous agents interact locally with their users, and remotely with other agents to make intelligent collective choices. Via these interactions they self-regulate the consumption and production of distributed resources. While suc… ▽ More

    Submitted 22 July, 2020; v1 submitted 6 February, 2020; originally announced February 2020.