-
Red Teaming for Generative AI, Report on a Copyright-Focused Exercise Completed in an Academic Medical Center
Authors:
James Wen,
Sahil Nalawade,
Zhiwei Liang,
Catherine Bielick,
Marisa Ferrara Boston,
Alexander Chowdhury,
Adele Collin,
Luigi De Angelis,
Jacob Ellen,
Heather Frase,
Rodrigo R. Gameiro,
Juan Manuel Gutierrez,
Pooja Kadam,
Murat Keceli,
Srikanth Krishnamurthy,
Anne Kwok,
Yanan Lance Lu,
Heather Mattie,
Liam G. McCoy,
Katherine Miller,
Allison C. Morgan,
Marlene Louisa Moerig,
Trang Nguyen,
Alexander Owen-Post,
Alex D. Ruiz
, et al. (16 additional authors not shown)
Abstract:
Background: Generative artificial intelligence (AI) deployment in academic medical settings raises copyright compliance concerns. Dana-Farber Cancer Institute implemented GPT4DFCI, an internal generative AI tool utilizing OpenAI models, that is approved for enterprise use in research and operations. Given (1) the exceptionally broad adoption of the tool in our organization, (2) our research missio…
▽ More
Background: Generative artificial intelligence (AI) deployment in academic medical settings raises copyright compliance concerns. Dana-Farber Cancer Institute implemented GPT4DFCI, an internal generative AI tool utilizing OpenAI models, that is approved for enterprise use in research and operations. Given (1) the exceptionally broad adoption of the tool in our organization, (2) our research mission, and (3) the shared responsibility model required to benefit from Customer Copyright Commitment in Azure OpenAI Service products, we deemed rigorous copyright compliance testing necessary.
Case Description: We conducted a structured red teaming exercise in Nov. 2024, with 42 participants from academic, industry, and government institutions. Four teams attempted to extract copyrighted content from GPT4DFCI across four domains: literary works, news articles, scientific publications, and access-restricted clinical notes. Teams successfully extracted verbatim book dedications and near-exact passages through various strategies. News article extraction failed despite jailbreak attempts. Scientific article reproduction yielded only high-level summaries. Clinical note testing revealed appropriate privacy safeguards.
Discussion: The successful extraction of literary content indicates potential copyrighted material presence in training data, necessitating inference-time filtering. Differential success rates across content types suggest varying protective mechanisms. The event led to implementation of a copyright-specific meta-prompt in GPT4DFCI; this mitigation has been in production since Jan. 2025.
Conclusion: Systematic red teaming revealed specific vulnerabilities in generative AI copyright compliance, leading to concrete mitigation strategies. Academic medical institutions deploying generative AI should implement continuous testing protocols to ensure legal and ethical compliance.
△ Less
Submitted 2 July, 2025; v1 submitted 26 June, 2025;
originally announced June 2025.
-
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
Authors:
Boyu Gou,
Zanming Huang,
Yuting Ning,
Yu Gu,
Michael Lin,
Weijian Qi,
Andrei Kopanev,
Botao Yu,
Bernal Jiménez Gutiérrez,
Yiheng Shu,
Chan Hee Song,
Jiaman Wu,
Shijie Chen,
Hanane Nour Moussa,
Tianshu Zhang,
Jian Xie,
Yifei Li,
Tianci Xue,
Zeyi Liao,
Kai Zhang,
Boyuan Zheng,
Zhaowei Cai,
Viktor Rozgic,
Morteza Ziyadi,
Huan Sun
, et al. (1 additional authors not shown)
Abstract:
Agentic search such as Deep Research systems-where agents autonomously browse the web, synthesize information, and return comprehensive citation-backed answers-represents a major shift in how users interact with web-scale information. While promising greater efficiency and cognitive offloading, the growing complexity and open-endedness of agentic search have outpaced existing evaluation benchmarks…
▽ More
Agentic search such as Deep Research systems-where agents autonomously browse the web, synthesize information, and return comprehensive citation-backed answers-represents a major shift in how users interact with web-scale information. While promising greater efficiency and cognitive offloading, the growing complexity and open-endedness of agentic search have outpaced existing evaluation benchmarks and methodologies, which largely assume short search horizons and static answers. In this paper, we introduce Mind2Web 2, a benchmark of 130 realistic, high-quality, and long-horizon tasks that require real-time web browsing and extensive information synthesis, constructed with over 1000 hours of human labor. To address the challenge of evaluating time-varying and complex answers, we propose a novel Agent-as-a-Judge framework. Our method constructs task-specific judge agents based on a tree-structured rubric design to automatically assess both answer correctness and source attribution. We conduct a comprehensive evaluation of ten frontier agentic search systems and human performance, along with a detailed error analysis to draw insights for future development. The best-performing system, OpenAI Deep Research, can already achieve 50-70% of human performance while spending half the time, highlighting its great potential. Altogether, Mind2Web 2 provides a rigorous foundation for developing and benchmarking the next generation of agentic search systems.
△ Less
Submitted 3 July, 2025; v1 submitted 26 June, 2025;
originally announced June 2025.
-
Rate Analysis and Optimization of LoS Beyond Diagonal RIS-assisted MIMO Systems
Authors:
Ignacio Santamaria,
Jesus Gutierrez,
Mohammad Soleymani,
Eduard Jorswieck
Abstract:
In this letter, we derive an expression for the achievable rate in a multiple-input multiple-output (MIMO) system assisted by a beyond-diagonal reconfigurable intelligent surface (BD-RIS) when the channels to and from the BD-RIS are line-of-sight (LoS) while the direct link is non-line-of-sight (NLoS). The rate expression allows to derive the optimal unitary and symmetric scattering BD-RIS matrix…
▽ More
In this letter, we derive an expression for the achievable rate in a multiple-input multiple-output (MIMO) system assisted by a beyond-diagonal reconfigurable intelligent surface (BD-RIS) when the channels to and from the BD-RIS are line-of-sight (LoS) while the direct link is non-line-of-sight (NLoS). The rate expression allows to derive the optimal unitary and symmetric scattering BD-RIS matrix in closed form. Our simulation results show that the proposed solution is competitive even under the more usual Ricean channel fading model when the direct link is weak.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
Interference Minimization in Beyond-Diagonal RIS-assisted MIMO Interference Channels
Authors:
Ignacio Santamaria,
Mohammad Soleymani,
Eduard Jorswieck,
Jesus Gutierrez
Abstract:
This paper proposes a two-stage approach for passive and active beamforming in multiple-input multiple-output (MIMO) interference channels (ICs) assisted by a beyond-diagonal reconfigurable intelligent surface (BD-RIS). In the first stage, the passive BD-RIS is designed to minimize the aggregate interference power at all receivers, a cost function called interference leakage (IL). To this end, we…
▽ More
This paper proposes a two-stage approach for passive and active beamforming in multiple-input multiple-output (MIMO) interference channels (ICs) assisted by a beyond-diagonal reconfigurable intelligent surface (BD-RIS). In the first stage, the passive BD-RIS is designed to minimize the aggregate interference power at all receivers, a cost function called interference leakage (IL). To this end, we propose an optimization algorithm in the manifold of unitary matrices and a suboptimal but computationally efficient solution. In the second stage, users' active precoders are designed under different criteria such as minimizing the IL (min-IL), maximizing the signal-to-interference-plus-noise ratio (max-SINR), or maximizing the sum rate (max-SR). The residual interference not cancelled by the BD-RIS is treated as noise by the precoders. Our simulation results show that the max-SR precoders provide more than 20% sum rate improvement compared to other designs, especially when the BD-RIS has a moderate number of elements ($M<20$) and users transmit with high power, in which case the residual interference is still significant.
△ Less
Submitted 25 March, 2025;
originally announced March 2025.
-
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
Authors:
Bernal Jiménez Gutiérrez,
Yiheng Shu,
Weijian Qi,
Sizhe Zhou,
Yu Su
Abstract:
Our ability to continuously acquire, organize, and leverage knowledge is a key feature of human intelligence that AI systems must approximate to unlock their full potential. Given the challenges in continual learning with large language models (LLMs), retrieval-augmented generation (RAG) has become the dominant way to introduce new information. However, its reliance on vector retrieval hinders its…
▽ More
Our ability to continuously acquire, organize, and leverage knowledge is a key feature of human intelligence that AI systems must approximate to unlock their full potential. Given the challenges in continual learning with large language models (LLMs), retrieval-augmented generation (RAG) has become the dominant way to introduce new information. However, its reliance on vector retrieval hinders its ability to mimic the dynamic and interconnected nature of human long-term memory. Recent RAG approaches augment vector embeddings with various structures like knowledge graphs to address some of these gaps, namely sense-making and associativity. However, their performance on more basic factual memory tasks drops considerably below standard RAG. We address this unintended deterioration and propose HippoRAG 2, a framework that outperforms standard RAG comprehensively on factual, sense-making, and associative memory tasks. HippoRAG 2 builds upon the Personalized PageRank algorithm used in HippoRAG and enhances it with deeper passage integration and more effective online use of an LLM. This combination pushes this RAG system closer to the effectiveness of human long-term memory, achieving a 7% improvement in associative memory tasks over the state-of-the-art embedding model while also exhibiting superior factual knowledge and sense-making memory capabilities. This work paves the way for non-parametric continual learning for LLMs. Code and data are available at https://github.com/OSU-NLP-Group/HippoRAG.
△ Less
Submitted 19 June, 2025; v1 submitted 20 February, 2025;
originally announced February 2025.
-
Are Deep Learning Methods Suitable for Downscaling Global Climate Projections? Review and Intercomparison of Existing Models
Authors:
Jose González-Abad,
José Manuel Gutiérrez
Abstract:
Deep Learning (DL) has shown promise for downscaling global climate change projections under different approaches, including Perfect Prognosis (PP) and Regional Climate Model (RCM) emulation. Unlike emulators, PP downscaling models are trained on observational data, so it remains an open question whether they can plausibly extrapolate unseen conditions and changes in future emissions scenarios. He…
▽ More
Deep Learning (DL) has shown promise for downscaling global climate change projections under different approaches, including Perfect Prognosis (PP) and Regional Climate Model (RCM) emulation. Unlike emulators, PP downscaling models are trained on observational data, so it remains an open question whether they can plausibly extrapolate unseen conditions and changes in future emissions scenarios. Here we focus on this problem as the main drawback for the operationalization of these methods and present the results of 1) a literature review to identify state-of-the-art DL models for PP downscaling and 2) an intercomparison experiment to evaluate the performance of these models and to assess their extrapolation capability using a common experimental framework, taking into account the sensitivity of results to different training replicas. We focus on minimum and maximum temperatures and precipitation over Spain, a region with a range of climatic conditions with different influential regional processes. We conclude with a discussion of the findings, limitations of existing methods, and prospects for future development.
△ Less
Submitted 6 November, 2024;
originally announced November 2024.
-
Transformer based super-resolution downscaling for regional reanalysis: Full domain vs tiling approaches
Authors:
Antonio Pérez,
Mario Santa Cruz,
Daniel San Martín,
José Manuel Gutiérrez
Abstract:
Super-resolution (SR) is a promising cost-effective downscaling methodology for producing high-resolution climate information from coarser counterparts. A particular application is downscaling regional reanalysis outputs (predictand) from the driving global counterparts (predictor). This study conducts an intercomparison of various SR downscaling methods focusing on temperature and using the CERRA…
▽ More
Super-resolution (SR) is a promising cost-effective downscaling methodology for producing high-resolution climate information from coarser counterparts. A particular application is downscaling regional reanalysis outputs (predictand) from the driving global counterparts (predictor). This study conducts an intercomparison of various SR downscaling methods focusing on temperature and using the CERRA reanalysis (5.5 km resolution, produced with a regional atmospheric model driven by ERA5) as example. The method proposed in this work is the Swin transformer and two alternative methods are used as benchmark (fully convolutional U-Net and convolutional and dense DeepESD) as well as the simple bicubic interpolation. We compare two approaches, the standard one using the full domain as input and a more scalable tiling approach, dividing the full domain into tiles that are used as input. The methods are trained to downscale CERRA surface temperature, based on temperature information from the driving ERA5; in addition, the tiling approach includes static orographic information. We show that the tiling approach, which requires spatial transferability, comes at the cost of a lower performance (although it outperforms some full-domain benchmarks), but provides an efficient scalable solution that allows SR reduction on a pan-European scale and is valuable for real-time applications.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers
Authors:
Shijie Chen,
Bernal Jiménez Gutiérrez,
Yu Su
Abstract:
Information retrieval (IR) systems have played a vital role in modern digital life and have cemented their continued usefulness in this new era of generative AI via retrieval-augmented generation. With strong language processing capabilities and remarkable versatility, large language models (LLMs) have become popular choices for zero-shot re-ranking in IR systems. So far, LLM-based re-ranking meth…
▽ More
Information retrieval (IR) systems have played a vital role in modern digital life and have cemented their continued usefulness in this new era of generative AI via retrieval-augmented generation. With strong language processing capabilities and remarkable versatility, large language models (LLMs) have become popular choices for zero-shot re-ranking in IR systems. So far, LLM-based re-ranking methods rely on strong generative capabilities, which restricts their use to either specialized or powerful proprietary models. Given these restrictions, we ask: is autoregressive generation necessary and optimal for LLMs to perform re-ranking? We hypothesize that there are abundant signals relevant to re-ranking within LLMs that might not be used to their full potential via generation. To more directly leverage such signals, we propose in-context re-ranking (ICR), a novel method that leverages the change in attention pattern caused by the search query for accurate and efficient re-ranking. To mitigate the intrinsic biases in LLMs, we propose a calibration method using a content-free query. Due to the absence of generation, ICR only requires two ($O(1)$) forward passes to re-rank $N$ documents, making it substantially more efficient than generative re-ranking methods that require at least $O(N)$ forward passes. Our novel design also enables ICR to be applied to any LLM without specialized training while guaranteeing a well-formed ranking. Extensive experiments with two popular open-weight LLMs on standard single-hop and multi-hop information retrieval benchmarks show that ICR outperforms RankGPT while cutting the latency by more than 60% in practice. Through detailed analyses, we show that ICR's performance is specially strong on tasks that require more complex re-ranking signals. Our findings call for further exploration on novel ways of utilizing open-weight LLMs beyond text generation.
△ Less
Submitted 28 February, 2025; v1 submitted 3 October, 2024;
originally announced October 2024.
-
Towards a conjecture of Woodall for partial 3-trees
Authors:
Juan Gutiérrez
Abstract:
In 1978, Woodall conjectured the following: in a planar digraph, the size of a shortest dicycle is equal to the maximum cardinality of a collection of disjoint transversals of dicycles. We prove that this conjecture is true when the underlying graph is a planar 3-tree.
In 1978, Woodall conjectured the following: in a planar digraph, the size of a shortest dicycle is equal to the maximum cardinality of a collection of disjoint transversals of dicycles. We prove that this conjecture is true when the underlying graph is a planar 3-tree.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
Brief state of the art in social information mining: Practical application in analysis of trends in French legislative 2024
Authors:
Jose A. Garcia Gutierrez
Abstract:
The analysis of social media information has undergone significant evolution in the last decade due to advancements in artificial intelligence (AI) and machine learning (ML). This paper provides an overview of the state-of-the-art techniques in social media mining, with a practical application in analyzing trends in the 2024 French legislative elections. We leverage natural language processing (NL…
▽ More
The analysis of social media information has undergone significant evolution in the last decade due to advancements in artificial intelligence (AI) and machine learning (ML). This paper provides an overview of the state-of-the-art techniques in social media mining, with a practical application in analyzing trends in the 2024 French legislative elections. We leverage natural language processing (NLP) tools to gauge public opinion by extracting and analyzing comments and reactions from the AgoraVox platform. The study reveals that the National Rally party, led by Marine Le Pen, maintains a high level of engagement on social media, outperforming traditional parties. This trend is corroborated by user interactions, indicating a strong digital presence. The results highlight the utility of advanced AI models, such as transformers and large language models (LLMs), in capturing nuanced public sentiments and predicting political leanings, demonstrating their potential in real-time reputation management and crisis response.
△ Less
Submitted 11 July, 2024;
originally announced August 2024.
-
MIMO Capacity Maximization with Beyond-Diagonal RIS
Authors:
Ignacio Santamaria,
Mohammad Soleymani,
Eduard Jorswieck,
Jesús Gutiérrez
Abstract:
This paper addresses the problem of maximizing the capacity of a multiple-input multiple-output (MIMO) link assisted by a beyond-diagonal reconfigurable intelligent surface (BD-RIS). We maximize the capacity by alternately optimizing the transmit covariance matrix, and the BD-RIS scattering matrix, which, according to network theory, should be unitary and symmetric. These constraints make the opti…
▽ More
This paper addresses the problem of maximizing the capacity of a multiple-input multiple-output (MIMO) link assisted by a beyond-diagonal reconfigurable intelligent surface (BD-RIS). We maximize the capacity by alternately optimizing the transmit covariance matrix, and the BD-RIS scattering matrix, which, according to network theory, should be unitary and symmetric. These constraints make the optimization of BD-RIS more challenging than that of diagonal RIS. To find a stationary point of the capacity we maximize a sequence of quadratic problems in the manifold of unitary matrices. This leads to an efficient algorithm that always improves the capacity obtained by a diagonal RIS. Through simulation examples, we study the capacity improvement provided by a passive BD-RIS architecture over the conventional RIS model in which the phase shift matrix is diagonal.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
What is a Goldilocks Face Verification Test Set?
Authors:
Haiyu Wu,
Sicong Tian,
Aman Bhatta,
Jacob Gutierrez,
Grace Bezold,
Genesis Argueta,
Karl Ricanek Jr.,
Michael C. King,
Kevin W. Bowyer
Abstract:
Face Recognition models are commonly trained with web-scraped datasets containing millions of images and evaluated on test sets emphasizing pose, age and mixed attributes. With train and test sets both assembled from web-scraped images, it is critical to ensure disjoint sets of identities between train and test sets. However, existing train and test sets have not considered this. Moreover, as accu…
▽ More
Face Recognition models are commonly trained with web-scraped datasets containing millions of images and evaluated on test sets emphasizing pose, age and mixed attributes. With train and test sets both assembled from web-scraped images, it is critical to ensure disjoint sets of identities between train and test sets. However, existing train and test sets have not considered this. Moreover, as accuracy levels become saturated, such as LFW $>99.8\%$, more challenging test sets are needed. We show that current train and test sets are generally not identity- or even image-disjoint, and that this results in an optimistic bias in the estimated accuracy. In addition, we show that identity-disjoint folds are important in the 10-fold cross-validation estimate of test accuracy. To better support continued advances in face recognition, we introduce two "Goldilocks" test sets, Hadrian and Eclipse. The former emphasizes challenging facial hairstyles and latter emphasizes challenging over- and under-exposure conditions. Images in both datasets are from a large, controlled-acquisition (not web-scraped) dataset, so they are identity- and image-disjoint with all popular training sets. Accuracy for these new test sets generally falls below that observed on LFW, CPLFW, CALFW, CFP-FP and AgeDB-30, showing that these datasets represent important dimensions for improvement of face recognition. The datasets are available at: \url{https://github.com/HaiyuWu/SOTA-Face-Recognition-Train-and-Test}
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
Authors:
Bernal Jiménez Gutiérrez,
Yiheng Shu,
Yu Gu,
Michihiro Yasunaga,
Yu Su
Abstract:
In order to thrive in hostile and ever-changing natural environments, mammalian brains evolved to store large amounts of knowledge about the world and continually integrate new information while avoiding catastrophic forgetting. Despite the impressive accomplishments, large language models (LLMs), even with retrieval-augmented generation (RAG), still struggle to efficiently and effectively integra…
▽ More
In order to thrive in hostile and ever-changing natural environments, mammalian brains evolved to store large amounts of knowledge about the world and continually integrate new information while avoiding catastrophic forgetting. Despite the impressive accomplishments, large language models (LLMs), even with retrieval-augmented generation (RAG), still struggle to efficiently and effectively integrate a large amount of new experiences after pre-training. In this work, we introduce HippoRAG, a novel retrieval framework inspired by the hippocampal indexing theory of human long-term memory to enable deeper and more efficient knowledge integration over new experiences. HippoRAG synergistically orchestrates LLMs, knowledge graphs, and the Personalized PageRank algorithm to mimic the different roles of neocortex and hippocampus in human memory. We compare HippoRAG with existing RAG methods on multi-hop question answering and show that our method outperforms the state-of-the-art methods remarkably, by up to 20%. Single-step retrieval with HippoRAG achieves comparable or better performance than iterative retrieval like IRCoT while being 10-30 times cheaper and 6-13 times faster, and integrating HippoRAG into IRCoT brings further substantial gains. Finally, we show that our method can tackle new types of scenarios that are out of reach of existing methods. Code and data are available at https://github.com/OSU-NLP-Group/HippoRAG.
△ Less
Submitted 14 January, 2025; v1 submitted 23 May, 2024;
originally announced May 2024.
-
On Tuza's Conjecture in Dense Graphs
Authors:
Luis Chahua,
Juan Gutierrez
Abstract:
In 1982, Tuza conjectured that the size $τ(G)$ of a minimum set of edges that intersects every triangle of a graph $G$ is at most twice the size $ν(G)$ of a maximum set of edge-disjoint triangles of $G$. This conjecture was proved for several graph classes. In this paper, we present three results regarding Tuza's Conjecture for dense graphs. By using a probabilistic argument, Tuza proved its conje…
▽ More
In 1982, Tuza conjectured that the size $τ(G)$ of a minimum set of edges that intersects every triangle of a graph $G$ is at most twice the size $ν(G)$ of a maximum set of edge-disjoint triangles of $G$. This conjecture was proved for several graph classes. In this paper, we present three results regarding Tuza's Conjecture for dense graphs. By using a probabilistic argument, Tuza proved its conjecture for graphs on $n$ vertices with minimum degree at least $\frac{7n}{8}$. We extend this technique to show that Tuza's conjecture is valid for split graphs with minimum degree at least $\frac{3n}{5}$; and that $τ(G) < \frac{28}{15}ν(G)$ for every tripartite graph with minimum degree more than $\frac{33n}{56}$. Finally, we show that $τ(G)\leq \frac{3}{2}ν(G)$ when $G$ is a complete 4-partite graph. Moreover, this bound is tight.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
Identity Overlap Between Face Recognition Train/Test Data: Causing Optimistic Bias in Accuracy Measurement
Authors:
Haiyu Wu,
Sicong Tian,
Jacob Gutierrez,
Aman Bhatta,
Kağan Öztürk,
Kevin W. Bowyer
Abstract:
A fundamental tenet of pattern recognition is that overlap between training and testing sets causes an optimistic accuracy estimate. Deep CNNs for face recognition are trained for N-way classification of the identities in the training set. Accuracy is commonly estimated as average 10-fold classification accuracy on image pairs from test sets such as LFW, CALFW, CPLFW, CFP-FP and AgeDB-30. Because…
▽ More
A fundamental tenet of pattern recognition is that overlap between training and testing sets causes an optimistic accuracy estimate. Deep CNNs for face recognition are trained for N-way classification of the identities in the training set. Accuracy is commonly estimated as average 10-fold classification accuracy on image pairs from test sets such as LFW, CALFW, CPLFW, CFP-FP and AgeDB-30. Because train and test sets have been independently assembled, images and identities in any given test set may also be present in any given training set. In particular, our experiments reveal a surprising degree of identity and image overlap between the LFW family of test sets and the MS1MV2 training set. Our experiments also reveal identity label noise in MS1MV2. We compare accuracy achieved with same-size MS1MV2 subsets that are identity-disjoint and not identity-disjoint with LFW, to reveal the size of the optimistic bias. Using more challenging test sets from the LFW family, we find that the size of the optimistic bias is larger for more challenging test sets. Our results highlight the lack of and the need for identity-disjoint train and test methodology in face recognition research.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Domination and packing in graphs
Authors:
Renzo Gómez,
Juan Gutiérrez
Abstract:
Given a graph~$G$, the domination number, denoted by~$γ(G)$, is the minimum cardinality of a dominating set in~$G$. Dual to the notion of domination number is the packing number of a graph. A packing of~$G$ is a set of vertices whose pairwise distance is at least three. The packing number~$ρ(G)$ of~$G$ is the maximum cardinality of one such set. Furthermore, the inequality~$ρ(G) \leq γ(G)$ is well…
▽ More
Given a graph~$G$, the domination number, denoted by~$γ(G)$, is the minimum cardinality of a dominating set in~$G$. Dual to the notion of domination number is the packing number of a graph. A packing of~$G$ is a set of vertices whose pairwise distance is at least three. The packing number~$ρ(G)$ of~$G$ is the maximum cardinality of one such set. Furthermore, the inequality~$ρ(G) \leq γ(G)$ is well-known. Henning et al.\ conjectured that~$γ(G) \leq 2ρ(G)+1$ if~$G$ is subcubic. In this paper, we progress towards this conjecture by showing that~${γ(G) \leq \frac{120}{49}ρ(G)}$ if~$G$ is a bipartite cubic graph. We also show that $γ(G) \leq 3ρ(G)$ if~$G$ is a maximal outerplanar graph, and that~$γ(G) \leq 2ρ(G)$ if~$G$ is a biconvex graph. Moreover, in the last case, we show that this upper bound is tight.
△ Less
Submitted 8 February, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Characterising and Verifying the Core in Concurrent Multi-Player Mean-Payoff Games (Full Version)
Authors:
Julian Gutierrez,
Anthony W. Lin,
Muhammad Najib,
Thomas Steeples,
Michael Wooldridge
Abstract:
Concurrent multi-player mean-payoff games are important models for systems of agents with individual, non-dichotomous preferences. Whilst these games have been extensively studied in terms of their equilibria in non-cooperative settings, this paper explores an alternative solution concept: the core from cooperative game theory. This concept is particularly relevant for cooperative AI systems, as i…
▽ More
Concurrent multi-player mean-payoff games are important models for systems of agents with individual, non-dichotomous preferences. Whilst these games have been extensively studied in terms of their equilibria in non-cooperative settings, this paper explores an alternative solution concept: the core from cooperative game theory. This concept is particularly relevant for cooperative AI systems, as it enables the modelling of cooperation among agents, even when their goals are not fully aligned. Our contribution is twofold. First, we provide a characterisation of the core using discrete geometry techniques and establish a necessary and sufficient condition for its non-emptiness. We then use the characterisation to prove the existence of polynomial witnesses in the core. Second, we use the existence of such witnesses to solve key decision problems in rational verification and provide tight complexity bounds for the problem of checking whether some/every equilibrium in a game satisfies a given LTL or GR(1) specification. Our approach is general and can be adapted to handle other specifications expressed in various fragments of LTL without incurring additional computational costs.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Solving the Right Problem is Key for Translational NLP: A Case Study in UMLS Vocabulary Insertion
Authors:
Bernal Jimenez Gutierrez,
Yuqing Mao,
Vinh Nguyen,
Kin Wah Fung,
Yu Su,
Olivier Bodenreider
Abstract:
As the immense opportunities enabled by large language models become more apparent, NLP systems will be increasingly expected to excel in real-world settings. However, in many instances, powerful models alone will not yield translational NLP solutions, especially if the formulated problem is not well aligned with the real-world task. In this work, we study the case of UMLS vocabulary insertion, an…
▽ More
As the immense opportunities enabled by large language models become more apparent, NLP systems will be increasingly expected to excel in real-world settings. However, in many instances, powerful models alone will not yield translational NLP solutions, especially if the formulated problem is not well aligned with the real-world task. In this work, we study the case of UMLS vocabulary insertion, an important real-world task in which hundreds of thousands of new terms, referred to as atoms, are added to the UMLS, one of the most comprehensive open-source biomedical knowledge bases. Previous work aimed to develop an automated NLP system to make this time-consuming, costly, and error-prone task more efficient. Nevertheless, practical progress in this direction has been difficult to achieve due to a problem formulation and evaluation gap between research output and the real-world task. In order to address this gap, we introduce a new formulation for UMLS vocabulary insertion which mirrors the real-world task, datasets which faithfully represent it and several strong baselines we developed through re-purposing existing solutions. Additionally, we propose an effective rule-enhanced biomedical language model which enables important new model behavior, outperforms all strong baselines and provides measurable qualitative improvements to editors who carry out the UVI task. We hope this case study provides insight into the considerable importance of problem formulation for the success of translational NLP solutions.
△ Less
Submitted 25 November, 2023;
originally announced November 2023.
-
Transferability and explainability of deep learning emulators for regional climate model projections: Perspectives for future applications
Authors:
Jorge Bano-Medina,
Maialen Iturbide,
Jesus Fernandez,
Jose Manuel Gutierrez
Abstract:
Regional climate models (RCMs) are essential tools for simulating and studying regional climate variability and change. However, their high computational cost limits the production of comprehensive ensembles of regional climate projections covering multiple scenarios and driving Global Climate Models (GCMs) across regions. RCM emulators based on deep learning models have recently been introduced a…
▽ More
Regional climate models (RCMs) are essential tools for simulating and studying regional climate variability and change. However, their high computational cost limits the production of comprehensive ensembles of regional climate projections covering multiple scenarios and driving Global Climate Models (GCMs) across regions. RCM emulators based on deep learning models have recently been introduced as a cost-effective and promising alternative that requires only short RCM simulations to train the models. Therefore, evaluating their transferability to different periods, scenarios, and GCMs becomes a pivotal and complex task in which the inherent biases of both GCMs and RCMs play a significant role. Here we focus on this problem by considering the two different emulation approaches proposed in the literature (PP and MOS, following the terminology introduced in this paper). In addition to standard evaluation techniques, we expand the analysis with methods from the field of eXplainable Artificial Intelligence (XAI), to assess the physical consistency of the empirical links learnt by the models. We find that both approaches are able to emulate certain climatological properties of RCMs for different periods and scenarios (soft transferability), but the consistency of the emulation functions differ between approaches. Whereas PP learns robust and physically meaningful patterns, MOS results are GCM-dependent and lack physical consistency in some cases. Both approaches face problems when transferring the emulation function to other GCMs, due to the existence of GCM-dependent biases (hard transferability). This limits their applicability to build ensembles of regional climate projections. We conclude by giving some prospects for future applications.
△ Less
Submitted 31 October, 2023;
originally announced November 2023.
-
On two conjectures about the intersection of longest paths and cycles
Authors:
Juan Gutiérrez,
Christian Valqui
Abstract:
A conjecture attributed to Smith states that every pair of longest cycles in a $k$-connected graph intersect each other in at least $k$ vertices. In this paper, we show that every pair of longest cycles in a~$k$-connected graph on $n$ vertices intersect each other in at least~$\min\{n,8k-n-16\}$ vertices, which confirms Smith's conjecture when $k\geq (n+16)/7$. An analog conjecture for paths inste…
▽ More
A conjecture attributed to Smith states that every pair of longest cycles in a $k$-connected graph intersect each other in at least $k$ vertices. In this paper, we show that every pair of longest cycles in a~$k$-connected graph on $n$ vertices intersect each other in at least~$\min\{n,8k-n-16\}$ vertices, which confirms Smith's conjecture when $k\geq (n+16)/7$. An analog conjecture for paths instead of cycles was stated by Hippchen. By a simple reduction, we relate both conjectures, showing that Hippchen's conjecture is valid when either $k \leq 6$ or $k \geq (n+9)/7$.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Independent dominating sets in planar triangulations
Authors:
Fábio Botler,
Cristina G. Fernandes,
Juan Gutiérrez
Abstract:
In 1996, Matheson and Tarjan proved that every near planar triangulation on $n$ vertices contains a dominating set of size at most $n/3$, and conjectured that this upper bound can be reduced to $n/4$ for planar triangulations when $n$ is sufficiently large. In this paper, we consider the analogous problem for independent dominating sets: What is the minimum $ε$ for which every near planar triangul…
▽ More
In 1996, Matheson and Tarjan proved that every near planar triangulation on $n$ vertices contains a dominating set of size at most $n/3$, and conjectured that this upper bound can be reduced to $n/4$ for planar triangulations when $n$ is sufficiently large. In this paper, we consider the analogous problem for independent dominating sets: What is the minimum $ε$ for which every near planar triangulation on $n$ vertices contains an independent dominating set of size at most $εn$? We prove that $2/7 \leq ε\leq 5/12$. Moreover, this upper bound can be improved to $3/8$ for planar triangulations, and to $1/3$ for planar triangulations with minimum degree 5.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
Multi-variable Hard Physical Constraints for Climate Model Downscaling
Authors:
Jose González-Abad,
Álex Hernández-García,
Paula Harder,
David Rolnick,
José Manuel Gutiérrez
Abstract:
Global Climate Models (GCMs) are the primary tool to simulate climate evolution and assess the impacts of climate change. However, they often operate at a coarse spatial resolution that limits their accuracy in reproducing local-scale phenomena. Statistical downscaling methods leveraging deep learning offer a solution to this problem by approximating local-scale climate fields from coarse variable…
▽ More
Global Climate Models (GCMs) are the primary tool to simulate climate evolution and assess the impacts of climate change. However, they often operate at a coarse spatial resolution that limits their accuracy in reproducing local-scale phenomena. Statistical downscaling methods leveraging deep learning offer a solution to this problem by approximating local-scale climate fields from coarse variables, thus enabling regional GCM projections. Typically, climate fields of different variables of interest are downscaled independently, resulting in violations of fundamental physical properties across interconnected variables. This study investigates the scope of this problem and, through an application on temperature, lays the foundation for a framework introducing multi-variable hard constraints that guarantees physical relationships between groups of downscaled climate variables.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Permutation and local permutation polynomial of maximum degree
Authors:
Jaime Gutierrez,
Jorge Jimenez Urroz
Abstract:
Let $F_q$ be the finite field with $q$ elements and $F_q[x_1,\ldots, x_n]$ the ring of polynomials in $n$ variables over $F_q$. In this paper we consider permutation polynomials and local permutation polynomials over $F_q[x_1,\ldots, x_n]$, which define interesting generalizations of permutations over finite fields. We are able to construct permutation polynomials in $F_q[x_1,\ldots, x_n]$ of maxi…
▽ More
Let $F_q$ be the finite field with $q$ elements and $F_q[x_1,\ldots, x_n]$ the ring of polynomials in $n$ variables over $F_q$. In this paper we consider permutation polynomials and local permutation polynomials over $F_q[x_1,\ldots, x_n]$, which define interesting generalizations of permutations over finite fields. We are able to construct permutation polynomials in $F_q[x_1,\ldots, x_n]$ of maximum degree $n(q-1)-1$ and local permutation polynomials in $F_q[x_1,\ldots, x_n]$ of maximum degree $n(q-2)$ when $q>3$, extending previous results.
△ Less
Submitted 29 August, 2023; v1 submitted 2 August, 2023;
originally announced August 2023.
-
SNR Maximization in Beyond Diagonal RIS-assisted Single and Multiple Antenna Links
Authors:
Ignacio Santamaria,
Mohammad Soleymani,
Eduard Jorswieck,
Jesus Gutierrez
Abstract:
Reconfigurable intelligent surface (RIS) architectures not limited to diagonal phase shift matrices have recently been considered to increase their flexibility in shaping the wireless channel. One of these beyond-diagonal RIS or BD-RIS architectures leads to a unitary and symmetric RIS matrix. In this letter, we consider the problem of maximizing the signal-to-noise ratio (SNR) in single and multi…
▽ More
Reconfigurable intelligent surface (RIS) architectures not limited to diagonal phase shift matrices have recently been considered to increase their flexibility in shaping the wireless channel. One of these beyond-diagonal RIS or BD-RIS architectures leads to a unitary and symmetric RIS matrix. In this letter, we consider the problem of maximizing the signal-to-noise ratio (SNR) in single and multiple antenna links assisted by a BD-RIS. The Max-SNR problem admits a closed-form solution based on the Takagi factorization of a certain complex and symmetric matrix. This allows us to solve the max-SNR problem for SISO, SIMO, and MISO channels.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Incentive Engineering for Concurrent Games
Authors:
David Hyland,
Julian Gutierrez,
Michael Wooldridge
Abstract:
We consider the problem of incentivising desirable behaviours in multi-agent systems by way of taxation schemes. Our study employs the concurrent games model: in this model, each agent is primarily motivated to seek the satisfaction of a goal, expressed as a Linear Temporal Logic (LTL) formula; secondarily, agents seek to minimise costs, where costs are imposed based on the actions taken by agents…
▽ More
We consider the problem of incentivising desirable behaviours in multi-agent systems by way of taxation schemes. Our study employs the concurrent games model: in this model, each agent is primarily motivated to seek the satisfaction of a goal, expressed as a Linear Temporal Logic (LTL) formula; secondarily, agents seek to minimise costs, where costs are imposed based on the actions taken by agents in different states of the game. In this setting, we consider an external principal who can influence agents' preferences by imposing taxes (additional costs) on the actions chosen by agents in different states. The principal imposes taxation schemes to motivate agents to choose a course of action that will lead to the satisfaction of their goal, also expressed as an LTL formula. However, taxation schemes are limited in their ability to influence agents' preferences: an agent will always prefer to satisfy its goal rather than otherwise, no matter what the costs. The fundamental question that we study is whether the principal can impose a taxation scheme such that, in the resulting game, the principal's goal is satisfied in at least one or all runs of the game that could arise by agents choosing to follow game-theoretic equilibrium strategies. We consider two different types of taxation schemes: in a static scheme, the same tax is imposed on a state-action profile pair in all circumstances, while in a dynamic scheme, the principal can choose to vary taxes depending on the circumstances. We investigate the main game-theoretic properties of this model as well as the computational complexity of the relevant decision problems.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Biomedical Language Models are Robust to Sub-optimal Tokenization
Authors:
Bernal Jiménez Gutiérrez,
Huan Sun,
Yu Su
Abstract:
As opposed to general English, many concepts in biomedical terminology have been designed in recent history by biomedical professionals with the goal of being precise and concise. This is often achieved by concatenating meaningful biomedical morphemes to create new semantic units. Nevertheless, most modern biomedical language models (LMs) are pre-trained using standard domain-specific tokenizers d…
▽ More
As opposed to general English, many concepts in biomedical terminology have been designed in recent history by biomedical professionals with the goal of being precise and concise. This is often achieved by concatenating meaningful biomedical morphemes to create new semantic units. Nevertheless, most modern biomedical language models (LMs) are pre-trained using standard domain-specific tokenizers derived from large scale biomedical corpus statistics without explicitly leveraging the agglutinating nature of biomedical language. In this work, we first find that standard open-domain and biomedical tokenizers are largely unable to segment biomedical terms into meaningful components. Therefore, we hypothesize that using a tokenizer which segments biomedical terminology more accurately would enable biomedical LMs to improve their performance on downstream biomedical NLP tasks, especially ones which involve biomedical terms directly such as named entity recognition (NER) and entity linking. Surprisingly, we find that pre-training a biomedical LM using a more accurate biomedical tokenizer does not improve the entity representation quality of a language model as measured by several intrinsic and extrinsic measures such as masked language modeling prediction (MLM) accuracy as well as NER and entity linking performance. These quantitative findings, along with a case study which explores entity representation quality more directly, suggest that the biomedical pre-training process is quite robust to instances of sub-optimal tokenization.
△ Less
Submitted 10 July, 2023; v1 submitted 30 June, 2023;
originally announced June 2023.
-
Designing Equilibria in Concurrent Games with Social Welfare and Temporal Logic Constraints
Authors:
Julian Gutierrez,
Muhammad Najib,
Giuseppe Perelli,
Michael Wooldridge
Abstract:
In game theory, mechanism design is concerned with the design of incentives so that a desired outcome of the game can be achieved. In this paper, we explore the concept of equilibrium design, where incentives are designed to obtain a desirable equilibrium that satisfies a specific temporal logic property. Our study is based on a framework where system specifications are represented as temporal log…
▽ More
In game theory, mechanism design is concerned with the design of incentives so that a desired outcome of the game can be achieved. In this paper, we explore the concept of equilibrium design, where incentives are designed to obtain a desirable equilibrium that satisfies a specific temporal logic property. Our study is based on a framework where system specifications are represented as temporal logic formulae, games as quantitative concurrent game structures, and players' goals as mean-payoff objectives. We consider system specifications given by LTL and GR(1) formulae, and show that designing incentives to ensure that a given temporal logic property is satisfied on some/every Nash equilibrium of the game can be achieved in PSPACE for LTL properties and in NP/ΣP 2 for GR(1) specifications. We also examine the complexity of related decision and optimisation problems, such as optimality and uniqueness of solutions, as well as considering social welfare, and show that the complexities of these problems lie within the polynomial hierarchy. Equilibrium design can be used as an alternative solution to rational synthesis and verification problems for concurrent games with mean-payoff objectives when no solution exists or as a technique to repair concurrent games with undesirable Nash equilibria in an optimal way.
△ Less
Submitted 3 December, 2024; v1 submitted 5 June, 2023;
originally announced June 2023.
-
Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors
Authors:
Kai Zhang,
Bernal Jiménez Gutiérrez,
Yu Su
Abstract:
Recent work has shown that fine-tuning large language models (LLMs) on large-scale instruction-following datasets substantially improves their performance on a wide range of NLP tasks, especially in the zero-shot setting. However, even advanced instruction-tuned LLMs still fail to outperform small LMs on relation extraction (RE), a fundamental information extraction task. We hypothesize that instr…
▽ More
Recent work has shown that fine-tuning large language models (LLMs) on large-scale instruction-following datasets substantially improves their performance on a wide range of NLP tasks, especially in the zero-shot setting. However, even advanced instruction-tuned LLMs still fail to outperform small LMs on relation extraction (RE), a fundamental information extraction task. We hypothesize that instruction-tuning has been unable to elicit strong RE capabilities in LLMs due to RE's low incidence in instruction-tuning datasets, making up less than 1% of all tasks (Wang et al., 2022). To address this limitation, we propose QA4RE, a framework that aligns RE with question answering (QA), a predominant task in instruction-tuning datasets. Comprehensive zero-shot RE experiments over four datasets with two series of instruction-tuned LLMs (six LLMs in total) demonstrate that our QA4RE framework consistently improves LLM performance, strongly verifying our hypothesis and enabling LLMs to outperform strong zero-shot baselines by a large margin. Additionally, we provide thorough experiments and discussions to show the robustness, few-shot effectiveness, and strong transferability of our QA4RE framework. This work illustrates a promising way of adapting LLMs to challenging and underrepresented tasks by aligning these tasks with more common instruction-tuning tasks like QA.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Principal-Agent Boolean Games
Authors:
David Hyland,
Julian Gutierrez,
Michael Wooldridge
Abstract:
We introduce and study a computational version of the principal-agent problem -- a classic problem in Economics that arises when a principal desires to contract an agent to carry out some task, but has incomplete information about the agent or their subsequent actions. The key challenge in this setting is for the principal to design a contract for the agent such that the agent's preferences are th…
▽ More
We introduce and study a computational version of the principal-agent problem -- a classic problem in Economics that arises when a principal desires to contract an agent to carry out some task, but has incomplete information about the agent or their subsequent actions. The key challenge in this setting is for the principal to design a contract for the agent such that the agent's preferences are then aligned with those of the principal. We study this problem using a variation of Boolean games, where multiple players each choose valuations for Boolean variables under their control, seeking the satisfaction of a personal goal, given as a Boolean logic formula. In our setting, the principal can only observe some subset of these variables, and the principal chooses a contract which rewards players on the basis of the assignments they make for the variables that are observable to the principal. The principal's challenge is to design a contract so that, firstly, the principal's goal is achieved in some or all Nash equilibrium choices, and secondly, that the principal is able to verify that their goal is satisfied. In this paper, we formally define this problem and completely characterise the computational complexity of the most relevant decision problems associated with it.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Interference Leakage Minimization in RIS-assisted MIMO Interference Channels
Authors:
Ignacio Santamaria,
Mohammad Soleymani,
Eduard Jorswieck,
Jesus Gutierrez
Abstract:
We address the problem of interference leakage (IL) minimization in the $K$-user multiple-input multiple-output (MIMO) interference channel (IC) assisted by a reconfigurable intelligent surface (RIS). We describe an iterative algorithm based on block coordinate descent to minimize the IL cost function. A reformulation of the problem provides a geometric interpretation and shows interesting connect…
▽ More
We address the problem of interference leakage (IL) minimization in the $K$-user multiple-input multiple-output (MIMO) interference channel (IC) assisted by a reconfigurable intelligent surface (RIS). We describe an iterative algorithm based on block coordinate descent to minimize the IL cost function. A reformulation of the problem provides a geometric interpretation and shows interesting connections with envelope precoding and phase-only zero-forcing beamforming problems. As a result of this analysis, we derive a set of necessary (but not sufficient) conditions for a phase-optimized RIS to be able to perfectly cancel the interference on the $K$-user MIMO IC.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
k-Prize Weighted Voting Games
Authors:
Wei-Chen Lee,
David Hyland,
Alessandro Abate,
Edith Elkind,
Jiarui Gan,
Julian Gutierrez,
Paul Harrenstein,
Michael Wooldridge
Abstract:
We introduce a natural variant of weighted voting games, which we refer to as k-Prize Weighted Voting Games. Such games consist of n players with weights, and k prizes, of possibly differing values. The players form coalitions, and the i-th largest coalition (by the sum of weights of its members) wins the i-th largest prize, which is then shared among its members. We present four solution concepts…
▽ More
We introduce a natural variant of weighted voting games, which we refer to as k-Prize Weighted Voting Games. Such games consist of n players with weights, and k prizes, of possibly differing values. The players form coalitions, and the i-th largest coalition (by the sum of weights of its members) wins the i-th largest prize, which is then shared among its members. We present four solution concepts to analyse the games in this class, and characterise the existence of stable outcomes in games with three players and two prizes, and in games with uniform prizes. We then explore the efficiency of stable outcomes in terms of Pareto optimality and utilitarian social welfare. Finally, we study the computational complexity of finding stable outcomes.
△ Less
Submitted 2 March, 2023; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Using Explainability to Inform Statistical Downscaling Based on Deep Learning Beyond Standard Validation Approaches
Authors:
Jose González-Abad,
Jorge Baño-Medina,
José Manuel Gutiérrez
Abstract:
Deep learning (DL) has emerged as a promising tool to downscale climate projections at regional-to-local scales from large-scale atmospheric fields following the perfect-prognosis (PP) approach. Given their complexity, it is crucial to properly evaluate these methods, especially when applied to changing climatic conditions where the ability to extrapolate/generalise is key. In this work, we interc…
▽ More
Deep learning (DL) has emerged as a promising tool to downscale climate projections at regional-to-local scales from large-scale atmospheric fields following the perfect-prognosis (PP) approach. Given their complexity, it is crucial to properly evaluate these methods, especially when applied to changing climatic conditions where the ability to extrapolate/generalise is key. In this work, we intercompare several DL models extracted from the literature for the same challenging use-case (downscaling temperature in the CORDEX North America domain) and expand standard evaluation methods building on eXplainable artifical intelligence (XAI) techniques. We show how these techniques can be used to unravel the internal behaviour of these models, providing new evaluation dimensions and aiding in their diagnostic and design. These results show the usefulness of incorporating XAI techniques into statistical downscaling evaluation frameworks, especially when working with large regions and/or under climate change conditions.
△ Less
Submitted 3 February, 2023;
originally announced February 2023.
-
Cooperative Concurrent Games
Authors:
Julian Gutierrez,
Szymon Kowara,
Sarit Kraus,
Thomas Steeples,
Michael Wooldridge
Abstract:
In rational verification, the aim is to verify which temporal logic properties will obtain in a multi-agent system, under the assumption that agents ("players") in the system choose strategies for acting that form a game theoretic equilibrium. Preferences are typically defined by assuming that agents act in pursuit of individual goals, specified as temporal logic formulae. To date, rational verifi…
▽ More
In rational verification, the aim is to verify which temporal logic properties will obtain in a multi-agent system, under the assumption that agents ("players") in the system choose strategies for acting that form a game theoretic equilibrium. Preferences are typically defined by assuming that agents act in pursuit of individual goals, specified as temporal logic formulae. To date, rational verification has been studied using non-cooperative solution concepts - Nash equilibrium and refinements thereof. Such non-cooperative solution concepts assume that there is no possibility of agents forming binding agreements to cooperate, and as such they are restricted in their applicability. In this article, we extend rational verification to cooperative solution concepts, as studied in the field of cooperative game theory. We focus on the core, as this is the most fundamental (and most widely studied) cooperative solution concept. We begin by presenting a variant of the core that seems well-suited to the concurrent game setting, and we show that this version of the core can be characterised using ATL*. We then study the computational complexity of key decision problems associated with the core, which range from problems in PSPACE to problems in 3EXPTIME. We also investigate conditions that are sufficient to ensure that the core is non-empty, and explore when it is invariant under bisimilarity. We then introduce and study a number of variants of the main definition of the core, leading to the issue of credible deviations, and to stronger notions of collective stable behaviour. Finally, we study cooperative rational verification using an alternative model of preferences, in which players seek to maximise the mean-payoff they obtain over an infinite play in games where quantitative information is allowed.
△ Less
Submitted 15 January, 2023;
originally announced January 2023.
-
On Tuza's conjecture in co-chain graphs
Authors:
Luis Chahua,
Juan Gutiérrez
Abstract:
In 1981, Tuza conjectured that the cardinality of a minimum set of edges that intersects every triangle of a graph is at most twice the cardinality of a maximum set of edge-disjoint triangles. This conjecture have been proved for several important graph classes, as planar graphs, tripartite graphs, among others. However, it remains open on other important classes of graphs, as chordal graphs. Furt…
▽ More
In 1981, Tuza conjectured that the cardinality of a minimum set of edges that intersects every triangle of a graph is at most twice the cardinality of a maximum set of edge-disjoint triangles. This conjecture have been proved for several important graph classes, as planar graphs, tripartite graphs, among others. However, it remains open on other important classes of graphs, as chordal graphs. Furthermore, it remains open for main subclasses of chordal graphs, as split graphs and interval graphs. In this paper, we show that Tuza's conjecture is valid for co-chain graphs with even number of vertices in both sides of the partition, a known subclass of interval graphs.
△ Less
Submitted 18 July, 2023; v1 submitted 14 November, 2022;
originally announced November 2022.
-
Application of federated learning techniques for arrhythmia classification using 12-lead ECG signals
Authors:
Daniel Mauricio Jimenez Gutierrez,
Hafiz Muuhammad Hassan,
Lorella Landi,
Andrea Vitaletti,
Ioannis Chatzigiannakis
Abstract:
Artificial Intelligence-based (AI) analysis of large, curated medical datasets is promising for providing early detection, faster diagnosis, and more effective treatment using low-power Electrocardiography (ECG) monitoring devices information. However, accessing sensitive medical data from diverse sources is highly restricted since improper use, unsafe storage, or data leakage could violate a pers…
▽ More
Artificial Intelligence-based (AI) analysis of large, curated medical datasets is promising for providing early detection, faster diagnosis, and more effective treatment using low-power Electrocardiography (ECG) monitoring devices information. However, accessing sensitive medical data from diverse sources is highly restricted since improper use, unsafe storage, or data leakage could violate a person's privacy. This work uses a Federated Learning (FL) privacy-preserving methodology to train AI models over heterogeneous sets of high-definition ECG from 12-lead sensor arrays collected from six heterogeneous sources. We evaluated the capacity of the resulting models to achieve equivalent performance compared to state-of-the-art models trained in a Centralized Learning (CL) fashion. Moreover, we assessed the performance of our solution over Independent and Identical distributed (IID) and non-IID federated data. Our methodology involves machine learning techniques based on Deep Neural Networks and Long-Short-Term Memory models. It has a robust data preprocessing pipeline with feature engineering, selection, and data balancing techniques. Our AI models demonstrated comparable performance to models trained using CL, IID, and non-IID approaches. They showcased advantages in reduced complexity and faster training time, making them well-suited for cloud-edge architectures.
△ Less
Submitted 5 January, 2024; v1 submitted 23 August, 2022;
originally announced August 2022.
-
On the Complexity of Rational Verification
Authors:
Julian Gutierrez,
Muhammad Najib,
Giuseppe Perelli,
Michael Wooldridge
Abstract:
Rational verification refers to the problem of checking which temporal logic properties hold of a concurrent multiagent system, under the assumption that agents in the system choose strategies that form a game-theoretic equilibrium. Rational verification can be understood as a counterpart to model checking for multiagent systems, but while classical model checking can be done in polynomial time fo…
▽ More
Rational verification refers to the problem of checking which temporal logic properties hold of a concurrent multiagent system, under the assumption that agents in the system choose strategies that form a game-theoretic equilibrium. Rational verification can be understood as a counterpart to model checking for multiagent systems, but while classical model checking can be done in polynomial time for some temporal logic specification languages such as CTL, and polynomial space with LTL specifications, rational verification is much harder: the key decision problems for rational verification are 2EXPTIME-complete with LTL specifications, even when using explicit-state system representations. Against this background, our contributions in this paper are threefold. First, we show that the complexity of rational verification can be greatly reduced by restricting specifications to GR(1), a fragment of LTL that can represent a broad and practically useful class of response properties of reactive systems. In particular, we show that for a number of relevant settings, rational verification can be done in polynomial space and even in polynomial time. Second, we provide improved complexity results for rational verification when considering players' goals given by mean-payoff utility functions; arguably the most widely used approach for quantitative objectives in concurrent and multiagent systems. Finally, we consider the problem of computing outcomes that satisfy social welfare constraints. To this end, we consider both utilitarian and egalitarian social welfare and show that computing such outcomes is either PSPACE-complete or NP-complete.
△ Less
Submitted 6 July, 2022;
originally announced July 2022.
-
Emerging Immersive Communication Systems: Overview, Taxonomy, and Good Practises for QoE Assessment
Authors:
Pablo Pérez,
Ester Gonzalez-Sosa,
Jesús Gutiérrez,
Narciso García
Abstract:
Several technological and scientific advances have been achieved recently in the fields of immersive systems, which are offering new possibilities to applications and services in different communication domains, such as entertainment, virtual conferencing, working meetings, social relations, healthcare, and industry. Users of these immersive technologies can explore and experience the stimuli in a…
▽ More
Several technological and scientific advances have been achieved recently in the fields of immersive systems, which are offering new possibilities to applications and services in different communication domains, such as entertainment, virtual conferencing, working meetings, social relations, healthcare, and industry. Users of these immersive technologies can explore and experience the stimuli in a more interactive and personalized way than previous technologies. Thus, considering the new technological challenges related to these systems and the new perceptual dimensions and interaction behaviors involved, a deep understanding of the users' Quality of Experience is required to satisfy their demands and expectations. In this sense, it is essential to foster the research on evaluating the QoE of immersive communication systems, since this will provide useful outcomes to optimize them and to identify the factors that can deteriorate the user experience. With this aim, subjective tests are usually performed following standard methodologies, which are designed for specific technologies and services. Although numerous user studies have been already published, there are no recommendations or standards that define common testing methodologies to be applied to evaluate immersive communication systems, such as those developed for images and video. Therefore, a revision of the QoE evaluation methods designed for previous technologies is required to develop robust and reliable methodologies for immersive communication systems. Thus, the objective of this paper is to provide an overview of existing immersive communication systems and related user studies, which can help on the definition of basic guidelines and testing methodologies to be used when performing user tests of immersive communication systems, such as 360-degree video-based telepresence, avatar-based social VR, cooperative AR, etc.
△ Less
Submitted 1 September, 2022; v1 submitted 12 May, 2022;
originally announced May 2022.
-
Local permutation polynomials and the action of e-Klenian groups
Authors:
Jaime Gutierrez,
Jorge Jimenez Urroz
Abstract:
Permutation polynomials of finite fields have many applications in Coding Theory, Cryptography and Combinatorics. In the first part of this paper we present a new family of local permutation polynomials based on a class of symmetric subgroups without fixed points, the so called e-Klenian groups. In the second part we use the fact that bivariate local permutation polynomials define Latin Squares, t…
▽ More
Permutation polynomials of finite fields have many applications in Coding Theory, Cryptography and Combinatorics. In the first part of this paper we present a new family of local permutation polynomials based on a class of symmetric subgroups without fixed points, the so called e-Klenian groups. In the second part we use the fact that bivariate local permutation polynomials define Latin Squares, to discuss several constructions of Mutually Orthogonal Latin Squares (MOLS) and, in particular, we provide a new family of MOLS on size a prime power.
△ Less
Submitted 29 April, 2022;
originally announced May 2022.
-
A Probabilistic Chemical Programmable Computer
Authors:
Abhishek Sharma,
Marcus Tze-Kiat Ng,
Juan Manuel Parrilla Gutierrez,
Yibin Jiang,
Leroy Cronin
Abstract:
The exponential growth of the power of modern digital computers is based upon the miniaturisation of vast nanoscale arrays of electronic switches, but this will be eventually constrained by fabrication limits and power dissipation. Chemical processes have the potential to scale beyond these limits performing computations through chemical reactions, yet the lack of well-defined programmability limi…
▽ More
The exponential growth of the power of modern digital computers is based upon the miniaturisation of vast nanoscale arrays of electronic switches, but this will be eventually constrained by fabrication limits and power dissipation. Chemical processes have the potential to scale beyond these limits performing computations through chemical reactions, yet the lack of well-defined programmability limits their scalability and performance. We present a hybrid digitally programmable chemical array as a probabilistic computational machine that uses chemical oscillators partitioned in interconnected cells as a computational substrate. This hybrid architecture performs efficient computation by distributing between chemical and digital domains together with error correction. The efficiency is gained by combining digital with probabilistic chemical logic based on nearest neighbour interactions and hysteresis effects. We demonstrated the implementation of one- and two- dimensional Chemical Cellular Automata and solutions to combinatorial optimization problems.
△ Less
Submitted 28 April, 2022;
originally announced April 2022.
-
Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again
Authors:
Bernal Jiménez Gutiérrez,
Nikolas McNeal,
Clay Washington,
You Chen,
Lang Li,
Huan Sun,
Yu Su
Abstract:
The strong few-shot in-context learning capability of large pre-trained language models (PLMs) such as GPT-3 is highly appealing for application domains such as biomedicine, which feature high and diverse demands of language technologies but also high data annotation costs. In this paper, we present the first systematic and comprehensive study to compare the few-shot performance of GPT-3 in-contex…
▽ More
The strong few-shot in-context learning capability of large pre-trained language models (PLMs) such as GPT-3 is highly appealing for application domains such as biomedicine, which feature high and diverse demands of language technologies but also high data annotation costs. In this paper, we present the first systematic and comprehensive study to compare the few-shot performance of GPT-3 in-context learning with fine-tuning smaller (i.e., BERT-sized) PLMs on two highly representative biomedical information extraction tasks, named entity recognition and relation extraction. We follow the true few-shot setting to avoid overestimating models' few-shot performance by model selection over a large validation set. We also optimize GPT-3's performance with known techniques such as contextual calibration and dynamic in-context example retrieval. However, our results show that GPT-3 still significantly underperforms compared to simply fine-tuning a smaller PLM. In addition, GPT-3 in-context learning also yields smaller gains in accuracy when more training data becomes available. Our in-depth analyses further reveal issues of the in-context learning setting that may be detrimental to information extraction tasks in general. Given the high cost of experimenting with GPT-3, we hope our study provides guidance for biomedical researchers and practitioners towards more promising directions such as fine-tuning small PLMs.
△ Less
Submitted 5 November, 2022; v1 submitted 16 March, 2022;
originally announced March 2022.
-
Path eccentricity of graphs
Authors:
Renzo Gómez,
Juan Gutiérrez
Abstract:
Let $G$ be a connected graph. The eccentricity of a path $P$, denoted by ecc$_G(P)$, is the maximum distance from $P$ to any vertex in $G$. In the \textsc{Central path} (CP) problem our aim is to find a path of minimum eccentricity. This problem was introduced by Cockayne et al., in 1981, in the study of different centrality measures on graphs. They showed that CP can be solved in linear time in t…
▽ More
Let $G$ be a connected graph. The eccentricity of a path $P$, denoted by ecc$_G(P)$, is the maximum distance from $P$ to any vertex in $G$. In the \textsc{Central path} (CP) problem our aim is to find a path of minimum eccentricity. This problem was introduced by Cockayne et al., in 1981, in the study of different centrality measures on graphs. They showed that CP can be solved in linear time in trees, but it is known to be NP-hard in many classes of graphs such as chordal bipartite graphs, planar 3-connected graphs, split graphs, etc.
We investigate the path eccentricity of a connected graph~$G$ as a parameter. Let pe$(G)$ denote the value of ecc$_G(P)$ for a central path $P$ of $G$. We obtain tight upper bounds for pe$(G)$ in some graph classes. We show that pe$(G) \leq 1$ on biconvex graphs and that pe$(G) \leq 2$ on bipartite convex graphs. Moreover, we design algorithms that find such a path in linear time. On the other hand, by investigating the longest paths of a graph, we obtain tight upper bounds for pe$(G)$ on general graphs and $k$-connected graphs.
Finally, we study the relation between a central path and a longest path in a graph. We show that on trees, and bipartite permutation graphs, a longest path is also a central path. Furthermore, for superclasses of these graphs, we exhibit counterexamples for this property.
△ Less
Submitted 5 February, 2022;
originally announced February 2022.
-
Tailoring the Cyber Security Framework: How to Overcome the Complexities of Secure Live Virtual Machine Migration in Cloud Computing
Authors:
Hanif Deylami,
Jairo Gutierrez,
Roopak Sinha
Abstract:
This paper proposes a novel secure live virtual machine migration framework by using a virtual trusted platform module instance to improve the integrity of the migration process from one virtual machine to another on the same platform. The proposed framework, called Kororā, is designed and developed on a public infrastructure-as-a-service cloud-computing environment and runs concurrently on the sa…
▽ More
This paper proposes a novel secure live virtual machine migration framework by using a virtual trusted platform module instance to improve the integrity of the migration process from one virtual machine to another on the same platform. The proposed framework, called Kororā, is designed and developed on a public infrastructure-as-a-service cloud-computing environment and runs concurrently on the same hardware components (Input/Output, Central Processing Unit, Memory) and the same hypervisor (Xen); however, a combination of parameters needs to be evaluated before implementing Kororā. The implementation of Kororā is not practically feasible in traditional distributed computing environments. It requires fixed resources with high-performance capabilities, connected through a high-speed, reliable network. The following research objectives were determined to identify the integrity features of live virtual machine migration in the cloud system:
To understand the security issues associated with cloud computing, virtual trusted platform modules, virtualization, live virtual machine migration, and hypervisors; To identify the requirements for the proposed framework, including those related to live VM migration among different hypervisors; To design and validate the model, processes, and architectural features of the proposed framework; To propose and implement an end-to-end security architectural blueprint for cloud environments, providing an integrated view of protection mechanisms, and then to validate the proposed framework to improve the integrity of live VM migration. This is followed by a comprehensive review of the evaluation system architecture and the proposed framework state machine. The overarching aim of this paper, therefore, is to present a detailed analysis of the cloud computing security problem, from the perspective of cloud architectures and the cloud... [Abridged]
△ Less
Submitted 9 October, 2021;
originally announced October 2021.
-
DNN-assisted Particle-based Bayesian Joint Synchronization and Localization
Authors:
Meysam Goodarzi,
Vladica Sark,
Nebojsa Maletic,
Jesús Gutiérrez,
Giuseppe Caire,
Eckhard Grass
Abstract:
In this work, we propose a Deep neural network-assisted Particle Filter-based (DePF) approach to address the Mobile User (MU) joint synchronization and localization (sync\&loc) problem in ultra dense networks. In particular, DePF deploys an asymmetric time-stamp exchange mechanism between the MUs and the Access Points (APs), which, traditionally, provides us with information about the MUs' clock o…
▽ More
In this work, we propose a Deep neural network-assisted Particle Filter-based (DePF) approach to address the Mobile User (MU) joint synchronization and localization (sync\&loc) problem in ultra dense networks. In particular, DePF deploys an asymmetric time-stamp exchange mechanism between the MUs and the Access Points (APs), which, traditionally, provides us with information about the MUs' clock offset and skew. However, information about the distance between an AP and an MU is also intrinsic to the propagation delay experienced by exchanged time-stamps. In addition, to estimate the angle of arrival of the received synchronization packet, DePF draws on the multiple signal classification algorithm that is fed by Channel Impulse Response (CIR) experienced by the sync packets. The CIR is also leveraged on to determine the link condition, i.e. Line-of-Sight (LoS) or Non-LoS. Finally, to perform joint sync\&loc, DePF capitalizes on particle Gaussian mixtures that allow for a hybrid particle-based and parametric Bayesian Recursive Filtering (BRF) fusion of the aforementioned pieces of information and thus jointly estimate the position and clock parameters of the MUs. The simulation results verifies the superiority of the proposed algorithm over the state-of-the-art schemes, especially that of Extended Kalman filter- and linearized BRF-based joint sync\&loc. In particular, only drawing on the synchronization time-stamp exchange and CIRs, for 90$\%$of the cases, the absolute position and clock offset estimation error remain below 1 meter and 2 nanoseconds, respectively.
△ Less
Submitted 2 June, 2022; v1 submitted 29 September, 2021;
originally announced October 2021.
-
Rational Verification for Probabilistic Systems
Authors:
Julian Gutierrez,
Lewis Hammond,
Anthony W. Lin,
Muhammad Najib,
Michael Wooldridge
Abstract:
Rational verification is the problem of determining which temporal logic properties will hold in a multi-agent system, under the assumption that agents in the system act rationally, by choosing strategies that collectively form a game-theoretic equilibrium. Previous work in this area has largely focussed on deterministic systems. In this paper, we develop the theory and algorithms for rational ver…
▽ More
Rational verification is the problem of determining which temporal logic properties will hold in a multi-agent system, under the assumption that agents in the system act rationally, by choosing strategies that collectively form a game-theoretic equilibrium. Previous work in this area has largely focussed on deterministic systems. In this paper, we develop the theory and algorithms for rational verification in probabilistic systems. We focus on concurrent stochastic games (CSGs), which can be used to model uncertainty and randomness in complex multi-agent environments. We study the rational verification problem for both non-cooperative games and cooperative games in the qualitative probabilistic setting. In the former case, we consider LTL properties satisfied by the Nash equilibria of the game and in the latter case LTL properties satisfied by the core. In both cases, we show that the problem is 2EXPTIME-complete, thus not harder than the much simpler verification problem of model checking LTL properties of systems modelled as Markov decision processes (MDPs).
△ Less
Submitted 26 July, 2021; v1 submitted 19 July, 2021;
originally announced July 2021.
-
Learning complex dependency structure of gene regulatory networks from high dimensional micro-array data with Gaussian Bayesian networks
Authors:
Catharina Elisabeth Graafland,
José Manuel Gutiérrez
Abstract:
Gene expression datasets consist of thousand of genes with relatively small samplesizes (i.e. are large-$p$-small-$n$). Moreover, dependencies of various orders co-exist in the datasets. In the Undirected probabilistic Graphical Model (UGM) framework the Glasso algorithm has been proposed to deal with high dimensional micro-array datasets forcing sparsity. Also, modifications of the default Glasso…
▽ More
Gene expression datasets consist of thousand of genes with relatively small samplesizes (i.e. are large-$p$-small-$n$). Moreover, dependencies of various orders co-exist in the datasets. In the Undirected probabilistic Graphical Model (UGM) framework the Glasso algorithm has been proposed to deal with high dimensional micro-array datasets forcing sparsity. Also, modifications of the default Glasso algorithm are developed to overcome the problem of complex interaction structure. In this work we advocate the use of a simple score-based Hill Climbing algorithm (HC) that learns Gaussian Bayesian Networks (BNs) leaning on Directed Acyclic Graphs (DAGs). We compare HC with Glasso and its modifications in the UGM framework on their capability to reconstruct GRNs from micro-array data belonging to the Escherichia Coli genome. We benefit from the analytical properties of the Joint Probability Density (JPD) function on which both directed and undirected PGMs build to convert DAGs to UGMs.
We conclude that dependencies in complex data are learned best by the HC algorithm, presenting them most accurately and efficiently, simultaneously modelling strong local and weaker but significant global connections coexisting in the gene expression dataset. The HC algorithm adapts intrinsically to the complex dependency structure of the dataset, without forcing a specific structure in advance. On the contrary, Glasso and modifications model unnecessary dependencies at the expense of the probabilistic information in the network and of a structural bias in the JPD function that can only be relieved including many parameters.
△ Less
Submitted 14 February, 2022; v1 submitted 28 June, 2021;
originally announced June 2021.
-
Equilibrium Design for Concurrent Games
Authors:
Julian Gutierrez,
Muhammad Najib,
Giuseppe Perelli,
Michael Wooldridge
Abstract:
In game theory, mechanism design is concerned with the design of incentives so that a desired outcome of the game can be achieved. In this paper, we study the design of incentives so that a desirable equilibrium is obtained, for instance, an equilibrium satisfying a given temporal logic property -- a problem that we call equilibrium design. We base our study on a framework where system specificati…
▽ More
In game theory, mechanism design is concerned with the design of incentives so that a desired outcome of the game can be achieved. In this paper, we study the design of incentives so that a desirable equilibrium is obtained, for instance, an equilibrium satisfying a given temporal logic property -- a problem that we call equilibrium design. We base our study on a framework where system specifications are represented as temporal logic formulae, games as quantitative concurrent game structures, and players' goals as mean-payoff objectives. In particular, we consider system specifications given by LTL and GR(1) formulae, and show that implementing a mechanism to ensure that a given temporal logic property is satisfied on some/every Nash equilibrium of the game, whenever such a mechanism exists, can be done in PSPACE for LTL properties and in NP/$Σ^{P}_{2}$ for GR(1) specifications. We also study the complexity of various related decision and optimisation problems, such as optimality and uniqueness of solutions, and show that the complexities of all such problems lie within the polynomial hierarchy. As an application, equilibrium design can be used as an alternative solution to the rational synthesis and verification problems for concurrent games with mean-payoff objectives whenever no solution exists, or as a technique to repair, whenever possible, concurrent games with undesirable rational outcomes (Nash equilibria) in an optimal way.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
All longest cycles intersect in partial 3-trees
Authors:
Juan Gutiérrez
Abstract:
We show that all longest cycles intersect in 2-connected partial 3-trees.
We show that all longest cycles intersect in 2-connected partial 3-trees.
△ Less
Submitted 8 March, 2021;
originally announced March 2021.
-
Methodology to Assess Quality, Presence, Empathy, Attitude, and Attention in 360-degree Videos for Immersive Communications
Authors:
Marta Orduna,
Pablo Pérez,
Jesús Gutiérrez,
Narciso García
Abstract:
This paper analyzes the joint assessment of quality, spatial and social presence, empathy, attitude, and attention in three conditions: (A)visualizing and rating the quality of contents in a Head-Mounted Display (HMD), (B)visualizing the contents in an HMD,and (C)visualizing the contents in an HMD where participants can see their hands and take notes. The experiment simulates an immersive communic…
▽ More
This paper analyzes the joint assessment of quality, spatial and social presence, empathy, attitude, and attention in three conditions: (A)visualizing and rating the quality of contents in a Head-Mounted Display (HMD), (B)visualizing the contents in an HMD,and (C)visualizing the contents in an HMD where participants can see their hands and take notes. The experiment simulates an immersive communication where participants attend conversations of different genres and from different acquisition perspectives in the context of international experiences. Video quality is evaluated with Single-Stimulus Discrete Quality Evaluation (SSDQE) methodology. Spatial and social presence are evaluated with questionnaires adapted from the literature. Initial empathy is assessed with Interpersonal Reactivity Index(IRI) and a questionnaire is designed to evaluate attitude. Attention is evaluated with 3 questions that had pass/fail answers. 54 participants were evenly distributed among A, B, and C conditions taking into account their international experience backgrounds, obtaining a diverse sample of participants. The results from the subjective test validate the proposed methodology in VR communications, showing that video quality experiments can be adapted to conditions imposed by experiments focused on the evaluation of socioemotional features in terms of contents of long-duration, actor and observer acquisition perspectives, and genre. In addition, the positive results related to the sense of presence imply that technology can be relevant in the analyzed use case. The acquisition perspective greatly influences social presence and all the contents have a positive impact on all participants on their attitude towards international experiences. The annotated dataset, Student Experiences Around the World dataset (SEAW-dataset), obtained from the experiment is made publicly available.
△ Less
Submitted 9 February, 2022; v1 submitted 3 March, 2021;
originally announced March 2021.
-
Multi-Agent Reinforcement Learning with Temporal Logic Specifications
Authors:
Lewis Hammond,
Alessandro Abate,
Julian Gutierrez,
Michael Wooldridge
Abstract:
In this paper, we study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment, which may exhibit probabilistic behaviour. From a learning perspective these specifications provide a rich formal language with which to capture tasks or objectives, while from a logic and automated verification perspective the introduction of learning capabili…
▽ More
In this paper, we study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment, which may exhibit probabilistic behaviour. From a learning perspective these specifications provide a rich formal language with which to capture tasks or objectives, while from a logic and automated verification perspective the introduction of learning capabilities allows for practical applications in large, stochastic, unknown environments. The existing work in this area is, however, limited. Of the frameworks that consider full linear temporal logic or have correctness guarantees, all methods thus far consider only the case of a single temporal logic specification and a single agent. In order to overcome this limitation, we develop the first multi-agent reinforcement learning technique for temporal logic specifications, which is also novel in its ability to handle multiple specifications. We provide correctness and convergence guarantees for our main algorithm - ALMANAC (Automaton/Logic Multi-Agent Natural Actor-Critic) - even when using function approximation. Alongside our theoretical results, we further demonstrate the applicability of our technique via a set of preliminary experiments.
△ Less
Submitted 9 February, 2021; v1 submitted 31 January, 2021;
originally announced February 2021.
-
Multi-Player Games with LDL Goals over Finite Traces
Authors:
Julian Gutierrez,
Giuseppe Perelli,
Michael Wooldridge
Abstract:
Linear Dynamic Logic on finite traces LDLf is a powerful logic for reasoning about the behaviour of concurrent and multi-agent systems.
In this paper, we investigate techniques for both the characterisation and verification of equilibria in multi-player games with goals/objectives expressed using logics based on LDLf. This study builds upon a generalisation of Boolean games, a logic-based game m…
▽ More
Linear Dynamic Logic on finite traces LDLf is a powerful logic for reasoning about the behaviour of concurrent and multi-agent systems.
In this paper, we investigate techniques for both the characterisation and verification of equilibria in multi-player games with goals/objectives expressed using logics based on LDLf. This study builds upon a generalisation of Boolean games, a logic-based game model of multi-agent systems where players have goals succinctly represented in a logical way.
Because LDLf goals are considered, in the settings we study -- Reactive Modules games and iterated Boolean games with goals over finite traces -- players' goals can be defined to be regular properties while achieved in a finite, but arbitrarily large, trace.
In particular, using alternating automata, the paper investigates automata-theoretic approaches to the characterisation and verification of (pure strategy Nash) equilibria, shows that the set of Nash equilibria in multi-player games with LDLf objectives is regular, and provides complexity results for the associated automata constructions.
△ Less
Submitted 12 August, 2020;
originally announced August 2020.