Skip to main content

Showing 1–7 of 7 results for author: Tworkowski, S

Searching in archive cs. Search in all archives.
.
  1. Analysing The Impact of Sequence Composition on Language Model Pre-Training

    Authors: Yu Zhao, Yuanbin Qu, Konrad Staniszewski, Szymon Tworkowski, Wei Liu, Piotr Miłoś, Yuxiang Wu, Pasquale Minervini

    Abstract: Most language model pre-training frameworks concatenate multiple documents into fixed-length sequences and use causal masking to compute the likelihood of each token given its context; this strategy is widely adopted due to its simplicity and efficiency. However, to this day, the influence of the pre-training sequence composition strategy on the generalisation properties of the model remains under… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Journal ref: Analysing The Impact of Sequence Composition on Language Model Pre-Training (Zhao et al., ACL 2024)

  2. arXiv:2312.17296  [pdf, other

    cs.CL

    Structured Packing in LLM Training Improves Long Context Utilization

    Authors: Konrad Staniszewski, Szymon Tworkowski, Sebastian Jaszczur, Yu Zhao, Henryk Michalewski, Łukasz Kuciński, Piotr Miłoś

    Abstract: Recent advancements in long-context large language models have attracted significant attention, yet their practical applications often suffer from suboptimal context utilization. This study investigates structuring training data to enhance semantic interdependence, demonstrating that this approach effectively improves context utilization. To this end, we introduce the Structured Packing for Long C… ▽ More

    Submitted 27 February, 2025; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: AAAI'25

  3. arXiv:2307.05337  [pdf, other

    cs.CL

    Explaining Competitive-Level Programming Solutions using LLMs

    Authors: Jierui Li, Szymon Tworkowski, Yingying Wu, Raymond Mooney

    Abstract: In this paper, we approach competitive-level programming problem-solving as a composite task of reasoning and code generation. We propose a novel method to automatically annotate natural language explanations to \textit{<problem, solution>} pairs. We show that despite poor performance in solving competitive-level programming problems, state-of-the-art LLMs exhibit a strong capacity in describing a… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: 14 pages, presented at the 1st NLRSE workshop

  4. arXiv:2307.03170  [pdf, other

    cs.CL cs.AI cs.LG

    Focused Transformer: Contrastive Training for Context Scaling

    Authors: Szymon Tworkowski, Konrad Staniszewski, Mikołaj Pacek, Yuhuai Wu, Henryk Michalewski, Piotr Miłoś

    Abstract: Large language models have an exceptional capability to incorporate new information in a contextual manner. However, the full potential of such an approach is often restrained due to a limitation in the effective context length. One solution to this issue is to endow an attention layer with access to an external memory, which comprises of (key, value) pairs. Yet, as the number of documents increas… ▽ More

    Submitted 30 November, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: Accepted at 37th Conference on Neural Information Processing Systems (NeurIPS 2023). 28 pages, 10 figures, 11 tables

  5. arXiv:2303.04488  [pdf, other

    cs.LG cs.AI cs.LO

    Magnushammer: A Transformer-Based Approach to Premise Selection

    Authors: Maciej Mikuła, Szymon Tworkowski, Szymon Antoniak, Bartosz Piotrowski, Albert Qiaochu Jiang, Jin Peng Zhou, Christian Szegedy, Łukasz Kuciński, Piotr Miłoś, Yuhuai Wu

    Abstract: This paper presents a novel approach to premise selection, a crucial reasoning task in automated theorem proving. Traditionally, symbolic methods that rely on extensive domain knowledge and engineering effort are applied to this task. In contrast, this work demonstrates that contrastive training with the transformer architecture can achieve higher-quality retrieval of relevant premises, without th… ▽ More

    Submitted 18 March, 2024; v1 submitted 8 March, 2023; originally announced March 2023.

    Comments: ICLR 2024

  6. arXiv:2205.10893  [pdf, other

    cs.AI

    Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers

    Authors: Albert Q. Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzygóźdź, Piotr Miłoś, Yuhuai Wu, Mateja Jamnik

    Abstract: In theorem proving, the task of selecting useful premises from a large library to unlock the proof of a given conjecture is crucially important. This presents a challenge for all theorem provers, especially the ones based on language models, due to their relative inability to reason over huge volumes of premises in text form. This paper introduces Thor, a framework integrating language models and… ▽ More

    Submitted 22 May, 2022; originally announced May 2022.

  7. arXiv:2110.13711  [pdf, other

    cs.LG cs.CL

    Hierarchical Transformers Are More Efficient Language Models

    Authors: Piotr Nawrot, Szymon Tworkowski, Michał Tyrolski, Łukasz Kaiser, Yuhuai Wu, Christian Szegedy, Henryk Michalewski

    Abstract: Transformer models yield impressive results on many NLP and sequence modeling tasks. Remarkably, Transformers can handle long sequences which allows them to produce long coherent outputs: full paragraphs produced by GPT-3 or well-structured images produced by DALL-E. These large language models are impressive but also very inefficient and costly, which limits their applications and accessibility.… ▽ More

    Submitted 16 April, 2022; v1 submitted 26 October, 2021; originally announced October 2021.