Skip to main content

Showing 1–2 of 2 results for author: Erisken, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.03053  [pdf, ps, other

    cs.MA cs.AI cs.CL cs.CY cs.LG

    MAEBE: Multi-Agent Emergent Behavior Framework

    Authors: Sinem Erisken, Timothy Gothard, Martin Leitgab, Ram Potham

    Abstract: Traditional AI safety evaluations on isolated LLMs are insufficient as multi-agent AI ensembles become prevalent, introducing novel emergent risks. This paper introduces the Multi-Agent Emergent Behavior Evaluation (MAEBE) framework to systematically assess such risks. Using MAEBE with the Greatest Good Benchmark (and a novel double-inversion question technique), we demonstrate that: (1) LLM moral… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Preprint. This work has been submitted to the Multi-Agent Systems Workshop at ICML 2025 for review

  2. arXiv:2412.00944  [pdf, other

    cs.LG cs.AI

    Bilinear Convolution Decomposition for Causal RL Interpretability

    Authors: Narmeen Oozeer, Sinem Erisken, Alice Rigg

    Abstract: Efforts to interpret reinforcement learning (RL) models often rely on high-level techniques such as attribution or probing, which provide only correlational insights and coarse causal control. This work proposes replacing nonlinearities in convolutional neural networks (ConvNets) with bilinear variants, to produce a class of models for which these limitations can be addressed. We show bilinear mod… ▽ More

    Submitted 1 December, 2024; originally announced December 2024.

    Comments: 8 pages, 10 figures