Skip to main content

Showing 1–8 of 8 results for author: Bondarenko, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.13295  [pdf

    cs.AI

    Demonstrating specification gaming in reasoning models

    Authors: Alexander Bondarenko, Denis Volk, Dmitrii Volkov, Jeffrey Ladish

    Abstract: We demonstrate LLM agent specification gaming by instructing models to win against a chess engine. We find reasoning models like OpenAI o3 and DeepSeek R1 will often hack the benchmark by default, while language models like GPT-4o and Claude 3.5 Sonnet need to be told that normal play won't work to hack. We improve upon prior work like (Hubinger et al., 2024; Meinke et al., 2024; Weij et al., 20… ▽ More

    Submitted 15 May, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

    Comments: Updated with o3 results

  2. arXiv:2410.21330  [pdf, other

    cs.CL cs.AI

    LLM Robustness Against Misinformation in Biomedical Question Answering

    Authors: Alexander Bondarenko, Adrian Viehweger

    Abstract: The retrieval-augmented generation (RAG) approach is used to reduce the confabulation of large language models (LLMs) for question answering by retrieving and providing additional context coming from external knowledge sources (e.g., by adding the context to the prompt). However, injecting incorrect information can mislead the LLM to generate an incorrect answer. In this paper, we evaluate the e… ▽ More

    Submitted 27 October, 2024; originally announced October 2024.

  3. Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIR

    Authors: Nandan Thakur, Luiz Bonifacio, Maik Fröbe, Alexander Bondarenko, Ehsan Kamalloo, Martin Potthast, Matthias Hagen, Jimmy Lin

    Abstract: The zero-shot effectiveness of neural retrieval models is often evaluated on the BEIR benchmark -- a combination of different IR evaluation datasets. Interestingly, previous studies found that particularly on the BEIR subset Touché 2020, an argument retrieval task, neural retrieval models are considerably less effective than BM25. Still, so far, no further investigation has been conducted on what… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: SIGIR 2024 (Resource & Reproducibility Track)

  4. arXiv:2211.05035  [pdf, ps, other

    cs.CL cs.LG

    Combining Contrastive Learning and Knowledge Graph Embeddings to develop medical word embeddings for the Italian language

    Authors: Denys Amore Bondarenko, Roger Ferrod, Luigi Di Caro

    Abstract: Word embeddings play a significant role in today's Natural Language Processing tasks and applications. While pre-trained models may be directly employed and integrated into existing pipelines, they are often fine-tuned to better fit with specific languages or domains. In this paper, we attempt to improve available embeddings in the uncovered niche of the Italian medical domain through the combinat… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  5. Towards Axiomatic Explanations for Neural Ranking Models

    Authors: Michael Völske, Alexander Bondarenko, Maik Fröbe, Matthias Hagen, Benno Stein, Jaspreet Singh, Avishek Anand

    Abstract: Recently, neural networks have been successfully employed to improve upon state-of-the-art performance in ad-hoc retrieval tasks via machine-learned ranking functions. While neural retrieval models grow in complexity and impact, little is understood about their correspondence with well-studied IR principles. Recent work on interpretability in machine learning has provided tools and techniques to u… ▽ More

    Submitted 11 July, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: 10 pages, 2 figures. Published in the proceedings of ICTIR 2021

  6. Answering Comparative Questions: Better than Ten-Blue-Links?

    Authors: Matthias Schildwächter, Alexander Bondarenko, Julian Zenker, Matthias Hagen, Chris Biemann, Alexander Panchenko

    Abstract: We present CAM (comparative argumentative machine), a novel open-domain IR system to argumentatively compare objects with respect to information extracted from the Common Crawl. In a user study, the participants obtained 15% more accurate answers using CAM compared to a "traditional" keyword-based search and were 20% faster in finding the answer to comparative questions.

    Submitted 15 January, 2019; originally announced January 2019.

    Comments: In Proceeding of 2019 Conference on Human Information Interaction and Retrieval (CHIIR '19), March 10--14, 2019, Glasgow, United Kingdom

  7. arXiv:1809.06152  [pdf, other

    cs.CL

    Categorizing Comparative Sentences

    Authors: Alexander Panchenko, Alexander Bondarenko, Mirco Franzek, Matthias Hagen, Chris Biemann

    Abstract: We tackle the tasks of automatically identifying comparative sentences and categorizing the intended preference (e.g., "Python has better NLP libraries than MATLAB" => (Python, better, MATLAB). To this end, we manually annotate 7,199 sentences for 217 distinct target item pairs from several domains (27% of the sentences contain an oriented comparison in the sense of "better" or "worse"). A gradien… ▽ More

    Submitted 8 July, 2019; v1 submitted 17 September, 2018; originally announced September 2018.

    Comments: In Proceedings of the the 6th Workshop on Argument Mining (ArgMining'2019) August 1st, collocated with ACL 2019 in Florence, Italy

  8. arXiv:1004.5262  [pdf, ps, other

    cs.NE math.OC

    On Application of the Local Search and the Genetic Algorithms Techniques to Some Combinatorial Optimization Problems

    Authors: Anton Bondarenko

    Abstract: In this paper the approach to solving several combinatorial optimization problems using the local search and the genetic algorithm techniques is proposed. Initially this approach was developed in purpose to overcome some difficulties inhibiting the application of above mentioned techniques to the problems of the Questionnaire Theory. But when the algorithms were developed it became clear that them… ▽ More

    Submitted 29 April, 2010; originally announced April 2010.

    MSC Class: 90C27; 68P10