Skip to main content

Showing 1–2 of 2 results for author: Korgul, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.02972  [pdf, other

    cs.CL cs.AI

    LINGOLY-TOO: Disentangling Reasoning from Knowledge with Templatised Orthographic Obfuscation

    Authors: Jude Khouja, Karolina Korgul, Simi Hellsten, Lingyi Yang, Vlad Neacsu, Harry Mayne, Ryan Kearns, Andrew Bean, Adam Mahdi

    Abstract: The expanding knowledge and memorisation capacity of frontier language models allows them to solve many reasoning tasks directly by exploiting prior knowledge, leading to inflated estimates of their reasoning abilities. We introduce LINGOLY-TOO, a challenging reasoning benchmark grounded in natural language and designed to counteract the effect of non-reasoning abilities on reasoning estimates. Us… ▽ More

    Submitted 28 May, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

  2. arXiv:2310.07225  [pdf, ps, other

    cs.CL

    Do Large Language Models have Shared Weaknesses in Medical Question Answering?

    Authors: Andrew M. Bean, Karolina Korgul, Felix Krones, Robert McCraith, Adam Mahdi

    Abstract: Large language models (LLMs) have made rapid improvement on medical benchmarks, but their unreliability remains a persistent challenge for safe real-world uses. To design for the use LLMs as a category, rather than for specific models, requires developing an understanding of shared strengths and weaknesses which appear across models. To address this challenge, we benchmark a range of top LLMs and… ▽ More

    Submitted 11 October, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 8 pages, 10 figures. To appear in NeurIPS 2024 Advancements in Medical Foundation Models Workshop