Skip to main content

Showing 1–13 of 13 results for author: Amayuelas, A

.
  1. arXiv:2504.07072  [pdf, other

    cs.CL cs.CV

    Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

    Authors: Israfel Salazar, Manuel Fernández Burda, Shayekh Bin Islam, Arshia Soltani Moakhar, Shivalika Singh, Fabian Farestam, Angelika Romanou, Danylo Boiko, Dipika Khullar, Mike Zhang, Dominik Krzemiński, Jekaterina Novikova, Luísa Shimabucoro, Joseph Marvin Imperial, Rishabh Maheshwary, Sharad Duwal, Alfonso Amayuelas, Swati Rajwal, Jebish Purbey, Ahmed Ruby, Nicholas Popovič, Marek Suppa, Azmine Toushik Wasi, Ram Mohan Rao Kadiyala, Olga Tsymboi , et al. (20 additional authors not shown)

    Abstract: The evaluation of vision-language models (VLMs) has mainly relied on English-language benchmarks, leaving significant gaps in both multilingual and multicultural coverage. While multilingual benchmarks have expanded, both in size and languages, many rely on translations of English datasets, failing to capture cultural nuances. In this work, we propose Kaleidoscope, as the most comprehensive exam b… ▽ More

    Submitted 29 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

    Comments: v2: corrected the author list

  2. arXiv:2504.02051  [pdf, other

    cs.MA cs.AI cs.CL

    Self-Resource Allocation in Multi-Agent LLM Systems

    Authors: Alfonso Amayuelas, Jingbo Yang, Saaket Agashe, Ashwin Nagarajan, Antonis Antoniades, Xin Eric Wang, William Wang

    Abstract: With the development of LLMs as agents, there is a growing interest in connecting multiple agents into multi-agent systems to solve tasks concurrently, focusing on their role in task assignment and coordination. This paper explores how LLMs can effectively allocate computational tasks among multiple agents, considering factors such as cost, efficiency, and performance. In this work, we address key… ▽ More

    Submitted 19 April, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

  3. arXiv:2502.13247  [pdf, other

    cs.CL

    Grounding LLM Reasoning with Knowledge Graphs

    Authors: Alfonso Amayuelas, Joy Sain, Simerjot Kaur, Charese Smiley

    Abstract: Knowledge Graphs (KGs) are valuable tools for representing relationships between entities in a structured format. Traditionally, these knowledge bases are queried to extract specific information. However, question-answering (QA) over such KGs poses a challenge due to the intrinsic complexity of natural language compared to the structured format and the size of these graphs. Despite these challenge… ▽ More

    Submitted 21 February, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

  4. arXiv:2411.19799  [pdf, other

    cs.CL

    INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

    Authors: Angelika Romanou, Negar Foroutan, Anna Sotnikova, Zeming Chen, Sree Harsha Nelaturu, Shivalika Singh, Rishabh Maheshwary, Micol Altomare, Mohamed A. Haggag, Snegha A, Alfonso Amayuelas, Azril Hafizi Amirudin, Viraat Aryabumi, Danylo Boiko, Michael Chang, Jenny Chim, Gal Cohen, Aditya Kumar Dalmia, Abraham Diress, Sharad Duwal, Daniil Dzenhaliou, Daniel Fernando Erazo Florez, Fabian Farestam, Joseph Marvin Imperial, Shayekh Bin Islam , et al. (34 additional authors not shown)

    Abstract: The performance differential of large language models (LLM) between languages hinders their effective deployment in many regions, inhibiting the potential economic and societal value of generative AI tools in many communities. However, the development of functional LLMs in many languages (\ie, multilingual LLMs) is bottlenecked by the lack of high-quality evaluation resources in languages other th… ▽ More

    Submitted 29 November, 2024; originally announced November 2024.

  5. arXiv:2411.05990  [pdf, other

    cs.AI cs.CL cs.GT cs.LG cs.MA

    Game-theoretic LLM: Agent Workflow for Negotiation Games

    Authors: Wenyue Hua, Ollie Liu, Lingyao Li, Alfonso Amayuelas, Julie Chen, Lucas Jiang, Mingyu Jin, Lizhou Fan, Fei Sun, William Wang, Xintong Wang, Yongfeng Zhang

    Abstract: This paper investigates the rationality of large language models (LLMs) in strategic decision-making contexts, specifically within the framework of game theory. We evaluate several state-of-the-art LLMs across a spectrum of complete-information and incomplete-information games. Our findings reveal that LLMs frequently deviate from rational strategies, particularly as the complexity of the game inc… ▽ More

    Submitted 12 November, 2024; v1 submitted 8 November, 2024; originally announced November 2024.

    Comments: 45 pages, 12 figures

  6. arXiv:2407.14985  [pdf, other

    cs.CL cs.AI cs.LG

    Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data

    Authors: Xinyi Wang, Antonis Antoniades, Yanai Elazar, Alfonso Amayuelas, Alon Albalak, Kexun Zhang, William Yang Wang

    Abstract: The impressive capabilities of large language models (LLMs) have sparked debate over whether these models genuinely generalize to unseen tasks or predominantly rely on memorizing vast amounts of pretraining data. To explore this issue, we introduce an extended concept of memorization, distributional memorization, which measures the correlation between the LLM output probabilities and the pretraini… ▽ More

    Submitted 1 March, 2025; v1 submitted 20 July, 2024; originally announced July 2024.

    Comments: Accepted to ICLR 2025

  7. arXiv:2407.06426  [pdf, other

    cs.CL cs.AI cs.MA

    DebUnc: Improving Large Language Model Agent Communication With Uncertainty Metrics

    Authors: Luke Yoffe, Alfonso Amayuelas, William Yang Wang

    Abstract: Multi-agent debates have been introduced to improve the accuracy of Large Language Models (LLMs) by having multiple agents discuss solutions to a problem over several rounds of debate. However, models often generate incorrect yet confident-sounding responses, which can mislead others. This issue arises partly because agents do not consider how confident their peers are. To address this, we propose… ▽ More

    Submitted 21 February, 2025; v1 submitted 8 July, 2024; originally announced July 2024.

  8. arXiv:2406.14867  [pdf, other

    cs.LG cs.AI cs.CL

    Investigating the Transferability of Code Repair for Low-Resource Programming Languages

    Authors: Kyle Wong, Alfonso Amayuelas, Liangming Pan, William Yang Wang

    Abstract: Large language models (LLMs) have shown remarkable performance on code generation tasks. A recent use case is iterative code repair, where an LLM fixes an incorrect program by rationalizing about errors and generating new code. Recent works augment the code repair process by integrating modern techniques such as chain-of-thought reasoning or distillation, but only study their benefits on high-reso… ▽ More

    Submitted 16 October, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  9. arXiv:2406.14711  [pdf, other

    cs.CL cs.AI cs.MA

    MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate

    Authors: Alfonso Amayuelas, Xianjun Yang, Antonis Antoniades, Wenyue Hua, Liangming Pan, William Wang

    Abstract: Large Language Models (LLMs) have shown exceptional results on current benchmarks when working individually. The advancement in their capabilities, along with a reduction in parameter size and inference times, has facilitated the use of these models as agents, enabling interactions among multiple models to execute complex tasks. Such collaborations offer several advantages, including the use of sp… ▽ More

    Submitted 26 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  10. arXiv:2402.03268  [pdf, other

    cs.LG cs.AI cs.CL

    Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation

    Authors: Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang

    Abstract: Pre-trained language models (LMs) are able to perform complex reasoning without explicit fine-tuning. To understand how pre-training with a next-token prediction objective contributes to the emergence of such reasoning capability, we propose that we can view an LM as deriving new conclusions by aggregating indirect reasoning paths seen at pre-training time. We found this perspective effective in t… ▽ More

    Submitted 20 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted to ICML 2024

  11. arXiv:2305.13712  [pdf, other

    cs.CL cs.AI

    Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models

    Authors: Alfonso Amayuelas, Kyle Wong, Liangming Pan, Wenhu Chen, William Wang

    Abstract: This paper investigates the capabilities of Large Language Models (LLMs) in the context of understanding their knowledge and uncertainty over questions. Specifically, we focus on addressing known-unknown questions, characterized by high uncertainty due to the absence of definitive answers. To facilitate our study, we collect a new dataset with Known-Unknown Questions (KUQ) and establish a categori… ▽ More

    Submitted 1 July, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

  12. arXiv:2209.14464  [pdf, other

    cs.AI cs.LG

    Neural Methods for Logical Reasoning Over Knowledge Graphs

    Authors: Alfonso Amayuelas, Shuai Zhang, Susie Xi Rao, Ce Zhang

    Abstract: Reasoning is a fundamental problem for computers and deeply studied in Artificial Intelligence. In this paper, we specifically focus on answering multi-hop logical queries on Knowledge Graphs (KGs). This is a complicated task because, in real-world scenarios, the graphs tend to be large and incomplete. Most previous works have been unable to create models that accept full First-Order Logical (FOL)… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

    Comments: 14 pages, 5 figures, 11 tables

    Journal ref: International Conference on Learning Representations, 2022

  13. arXiv:2012.11448  [pdf, other

    cs.LG cs.AI

    The Importance of Modeling Data Missingness in Algorithmic Fairness: A Causal Perspective

    Authors: Naman Goel, Alfonso Amayuelas, Amit Deshpande, Amit Sharma

    Abstract: Training datasets for machine learning often have some form of missingness. For example, to learn a model for deciding whom to give a loan, the available training data includes individuals who were given a loan in the past, but not those who were not. This missingness, if ignored, nullifies any fairness guarantee of the training procedure when the model is deployed. Using causal graphs, we charact… ▽ More

    Submitted 21 December, 2020; originally announced December 2020.

    Comments: To appear in the Proceedings of AAAI 2021