Skip to main content

Showing 1–8 of 8 results for author: Siedler, P D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.16048  [pdf, ps, other

    cs.AI

    SPhyR: Spatial-Physical Reasoning Benchmark on Material Distribution

    Authors: Philipp D. Siedler

    Abstract: We introduce a novel dataset designed to benchmark the physical and spatial reasoning capabilities of Large Language Models (LLM) based on topology optimization, a method for computing optimal material distributions within a design space under prescribed loads and supports. In this dataset, LLMs are provided with conditions such as 2D boundary, applied forces and supports, and must reason about th… ▽ More

    Submitted 27 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  2. arXiv:2503.13553  [pdf, other

    cs.MA cs.AI cs.CL

    LLM-Mediated Guidance of MARL Systems

    Authors: Philipp D. Siedler, Ian Gemp

    Abstract: In complex multi-agent environments, achieving efficient learning and desirable behaviours is a significant challenge for Multi-Agent Reinforcement Learning (MARL) systems. This work explores the potential of combining MARL with Large Language Model (LLM)-mediated interventions to guide agents toward more desirable behaviours. Specifically, we investigate how LLMs can be used to interpret and faci… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

    Comments: 31 pages, 50 figures

  3. arXiv:2501.14249  [pdf, other

    cs.LG cs.AI cs.CL

    Humanity's Last Exam

    Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

    Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More

    Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 29 pages, 6 figures

  4. arXiv:2501.04180  [pdf, other

    cs.MA cs.AI cs.GT

    HIVEX: A High-Impact Environment Suite for Multi-Agent Research (extended version)

    Authors: Philipp Dominic Siedler

    Abstract: Games have been vital test beds for the rapid development of Agent-based research. Remarkable progress has been achieved in the past, but it is unclear if the findings equip for real-world problems. While pressure grows, some of the most critical ecological challenges can find mitigation and prevention solutions through technology and its applications. Most real-world domains include multi-agent s… ▽ More

    Submitted 21 January, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

  5. arXiv:2304.05872  [pdf, other

    cs.AI cs.LG cs.MA

    Learning to Communicate and Collaborate in a Competitive Multi-Agent Setup to Clean the Ocean from Macroplastics

    Authors: Philipp Dominic Siedler

    Abstract: Finding a balance between collaboration and competition is crucial for artificial agents in many real-world applications. We investigate this using a Multi-Agent Reinforcement Learning (MARL) setup on the back of a high-impact problem. The accumulation and yearly growth of plastic in the ocean cause irreparable damage to many aspects of oceanic health and the marina system. To prevent further dama… ▽ More

    Submitted 6 November, 2024; v1 submitted 12 April, 2023; originally announced April 2023.

    Comments: Tackling Climate Change with Machine Learning Workshop at the 11th International Conference on Learning Representations (ICLR 2023)

  6. arXiv:2211.15414  [pdf, other

    cs.AI cs.LG cs.MA cs.RO

    Dynamic Collaborative Multi-Agent Reinforcement Learning Communication for Autonomous Drone Reforestation

    Authors: Philipp Dominic Siedler

    Abstract: We approach autonomous drone-based reforestation with a collaborative multi-agent reinforcement learning (MARL) setup. Agents can communicate as part of a dynamically changing network. We explore collaboration and communication on the back of a high-impact problem. Forests are the main resource to control rising CO2 conditions. Unfortunately, the global forest volume is decreasing at an unpreceden… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: Deep Reinforcement Learning Workshop at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  7. arXiv:2204.11350  [pdf, other

    cs.AI cs.LG cs.MA

    Collaborative Auto-Curricula Multi-Agent Reinforcement Learning with Graph Neural Network Communication Layer for Open-ended Wildfire-Management Resource Distribution

    Authors: Philipp Dominic Siedler

    Abstract: Most real-world domains can be formulated as multi-agent (MA) systems. Intentionality sharing agents can solve more complex tasks by collaborating, possibly in less time. True cooperative actions are beneficial for egoistic and collective reasons. However, teaching individual agents to sacrifice egoistic benefits for a better collective performance seems challenging. We build on a recently propose… ▽ More

    Submitted 24 April, 2022; originally announced April 2022.

    Comments: Gamification and Multiagent Solutions Workshop at the 10th International Conference on Learning Representations (ICLR 2022)

  8. arXiv:2111.15611  [pdf, other

    cs.AI cs.LG cs.MA

    The Power of Communication in a Distributed Multi-Agent System

    Authors: Philipp Dominic Siedler

    Abstract: Single-Agent (SA) Reinforcement Learning systems have shown outstanding re-sults on non-stationary problems. However, Multi-Agent Reinforcement Learning(MARL) can surpass SA systems generally and when scaling. Furthermore, MAsystems can be super-powered by collaboration, which can happen through ob-serving others, or a communication system used to share information betweencollaborators. Here, we d… ▽ More

    Submitted 14 December, 2021; v1 submitted 30 November, 2021; originally announced November 2021.

    Comments: Cooperative AI Workshop at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia