Skip to main content

Showing 1–12 of 12 results for author: Papay, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.12516  [pdf, ps, other

    cs.CL

    BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents

    Authors: Jason Wei, Zhiqing Sun, Spencer Papay, Scott McKinney, Jeffrey Han, Isa Fulford, Hyung Won Chung, Alex Tachard Passos, William Fedus, Amelia Glaese

    Abstract: We present BrowseComp, a simple yet challenging benchmark for measuring the ability for agents to browse the web. BrowseComp comprises 1,266 questions that require persistently navigating the internet in search of hard-to-find, entangled information. Despite the difficulty of the questions, BrowseComp is simple and easy-to-use, as predicted answers are short and easily verifiable against reference… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  2. arXiv:2503.07179  [pdf, other

    cs.CL

    Strategies for political-statement segmentation and labelling in unstructured text

    Authors: Dmitry Nikolaev, Sean Papay

    Abstract: Analysis of parliamentary speeches and political-party manifestos has become an integral area of computational study of political texts. While speeches have been overwhelmingly analysed using unsupervised methods, a large corpus of manifestos with by-statement political-stance labels has been created by the participants of the MARPOR project. It has been recently shown that these labels can be pre… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: Accepted to NLP4DH 2025 @ NAACL 2025

  3. arXiv:2502.00617  [pdf, other

    cs.CL

    Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures

    Authors: Gabriel Lindenmaier, Sean Papay, Sebastian Padó

    Abstract: Transformer-based language models have recently been at the forefront of active research in text generation. However, these models' advances come at the price of prohibitive training costs, with parameter counts in the billions and compute requirements measured in petaflop/s-decades. In this paper, we investigate transformer-based architectures for improving model performance in a low-data regime… ▽ More

    Submitted 1 February, 2025; originally announced February 2025.

    Comments: PDF has 12 pages total, 7 without references and abstract; 10 individual graphics combined to 3 figures; 5 tables

  4. arXiv:2412.16720  [pdf, other

    cs.AI

    OpenAI o1 System Card

    Authors: OpenAI, :, Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, Alex Iftimie, Alex Karpenko, Alex Tachard Passos, Alexander Neitz, Alexander Prokofiev, Alexander Wei, Allison Tam, Ally Bennett, Ananya Kumar, Andre Saraiva, Andrea Vallone, Andrew Duberstein, Andrew Kondrich , et al. (238 additional authors not shown)

    Abstract: The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-ar… ▽ More

    Submitted 21 December, 2024; originally announced December 2024.

  5. arXiv:2411.12484  [pdf, ps, other

    cs.LG cs.CL

    Regular-pattern-sensitive CRFs for Distant Label Interactions

    Authors: Sean Papay, Roman Klinger, Sebastian Pado

    Abstract: While LLMs have grown popular in sequence labeling, linear-chain conditional random fields (CRFs) remain a popular alternative with the ability to directly model interactions between labels. However, the Markov assumption limits them to % only directly modeling interactions between adjacent labels. Weighted finite-state transducers (FSTs), in contrast, can model distant label--label interactions,… ▽ More

    Submitted 16 June, 2025; v1 submitted 19 November, 2024; originally announced November 2024.

  6. arXiv:2411.04368  [pdf, other

    cs.CL

    Measuring short-form factuality in large language models

    Authors: Jason Wei, Nguyen Karina, Hyung Won Chung, Yunxin Joy Jiao, Spencer Papay, Amelia Glaese, John Schulman, William Fedus

    Abstract: We present SimpleQA, a benchmark that evaluates the ability of language models to answer short, fact-seeking questions. We prioritized two properties in designing this eval. First, SimpleQA is challenging, as it is adversarially collected against GPT-4 responses. Second, responses are easy to grade, because questions are created such that there exists only a single, indisputable answer. Each answe… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

    Comments: Blog post: https://openai.com/index/introducing-simpleqa/

  7. arXiv:2410.21276  [pdf, other

    cs.CL cs.AI cs.CV cs.CY cs.LG cs.SD eess.AS

    GPT-4o System Card

    Authors: OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander Mądry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis , et al. (395 additional authors not shown)

    Abstract: GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  8. arXiv:2410.08820  [pdf, other

    cs.CL

    Which Demographics do LLMs Default to During Annotation?

    Authors: Johannes Schäfer, Aidan Combs, Christopher Bagdon, Jiahui Li, Nadine Probol, Lynn Greschner, Sean Papay, Yarik Menchaca Resendiz, Aswathy Velutharambath, Amelie Wührl, Sabine Weber, Roman Klinger

    Abstract: Demographics and cultural background of annotators influence the labels they assign in text annotation -- for instance, an elderly woman might find it offensive to read a message addressed to a "bro", but a male teenager might find it appropriate. It is therefore important to acknowledge label variations to not under-represent members of a society. Two research directions developed out of this obs… ▽ More

    Submitted 28 May, 2025; v1 submitted 11 October, 2024; originally announced October 2024.

    Comments: ACL 2025

  9. arXiv:2402.00620  [pdf, other

    cs.CL

    Actor Identification in Discourse: A Challenge for LLMs?

    Authors: Ana Barić, Sean Papay, Sebastian Padó

    Abstract: The identification of political actors who put forward claims in public debate is a crucial step in the construction of discourse networks, which are helpful to analyze societal debates. Actor identification is, however, rather challenging: Often, the locally mentioned speaker of a claim is only a pronoun ("He proposed that [claim]"), so recovering the canonical actor name requires discourse under… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: Proceedings of the EACL 2024 workshop on Computational Models of Discourse (St. Julian's, Malta)

  10. arXiv:2106.07306  [pdf, ps, other

    cs.LG cs.CL

    Constraining Linear-chain CRFs to Regular Languages

    Authors: Sean Papay, Roman Klinger, Sebastian Padó

    Abstract: A major challenge in structured prediction is to represent the interdependencies within output structures. When outputs are structured as sequences, linear-chain conditional random fields (CRFs) are a widely used model class which can learn \textit{local} dependencies in the output. However, the CRF's Markov assumption makes it impossible for CRFs to represent distributions with \textit{nonlocal}… ▽ More

    Submitted 11 August, 2023; v1 submitted 14 June, 2021; originally announced June 2021.

  11. arXiv:2010.02587  [pdf, other

    cs.CL

    Dissecting Span Identification Tasks with Performance Prediction

    Authors: Sean Papay, Roman Klinger, Sebastian Padó

    Abstract: Span identification (in short, span ID) tasks such as chunking, NER, or code-switching detection, ask models to identify and classify relevant spans in a text. Despite being a staple of NLP, and sharing a common structure, there is little insight on how these tasks' properties influence their difficulty, and thus little guidance on what model families work well on span ID tasks, and why. We analyz… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: accepted at EMNLP 2020

  12. arXiv:2008.03164  [pdf, other

    cs.CL

    IMS at SemEval-2020 Task 1: How low can you go? Dimensionality in Lexical Semantic Change Detection

    Authors: Jens Kaiser, Dominik Schlechtweg, Sean Papay, Sabine Schulte im Walde

    Abstract: We present the results of our system for SemEval-2020 Task 1 that exploits a commonly used lexical semantic change detection model based on Skip-Gram with Negative Sampling. Our system focuses on Vector Initialization (VI) alignment, compares VI to the currently top-ranking models for Subtask 2 and demonstrates that these can be outperformed if we optimize VI dimensionality. We demonstrate that di… ▽ More

    Submitted 7 August, 2020; originally announced August 2020.