Skip to main content

Showing 1–16 of 16 results for author: Josifoski, M

.
  1. arXiv:2403.14562  [pdf, other

    cs.CL cs.AI cs.HC cs.MA

    Agentic AI: The Era of Semantic Decoding

    Authors: Maxime Peyrard, Martin Josifoski, Robert West

    Abstract: Recent work demonstrated great promise in the idea of orchestrating collaborations between LLMs, human input, and various tools to address the inherent limitations of LLMs. We propose a novel perspective called semantic decoding, which frames these collaborative processes as optimization procedures in semantic space. Specifically, we conceptualize LLMs as semantic processors that manipulate meanin… ▽ More

    Submitted 29 April, 2025; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: 25 pages, 3 figures

  2. arXiv:2402.10575  [pdf, other

    cs.LG cs.AI

    Symbolic Autoencoding for Self-Supervised Sequence Learning

    Authors: Mohammad Hossein Amani, Nicolas Mario Baldwin, Amin Mansouri, Martin Josifoski, Maxime Peyrard, Robert West

    Abstract: Traditional language models, adept at next-token prediction in text sequences, often struggle with transduction tasks between distinct symbolic systems, particularly when parallel data is scarce. Addressing this issue, we introduce \textit{symbolic autoencoding} ($Σ$AE), a self-supervised framework that harnesses the power of abundant unparallel data alongside limited parallel data. $Σ$AE connects… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  3. arXiv:2401.09967  [pdf, other

    cs.CL

    Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access

    Authors: Saibo Geng, Berkay Döner, Chris Wendler, Martin Josifoski, Robert West

    Abstract: Constrained decoding, a technique for enforcing constraints on language model outputs, offers a way to control text generation without retraining or architectural modifications. Its application is, however, typically restricted to models that give users access to next-token distributions (usually via softmax logits), which poses a limitation with blackbox large language models (LLMs). This paper i… ▽ More

    Submitted 21 July, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted to ACL 2024 Oral

  4. arXiv:2401.04536  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating Language Model Agency through Negotiations

    Authors: Tim R. Davidson, Veniamin Veselovsky, Martin Josifoski, Maxime Peyrard, Antoine Bosselut, Michal Kosinski, Robert West

    Abstract: We introduce an approach to evaluate language model (LM) agency using negotiation games. This approach better reflects real-world use cases and addresses some of the shortcomings of alternative LM benchmarks. Negotiation games enable us to study multi-turn, and cross-model interactions, modulate complexity, and side-step accidental evaluation data leakage. We use our approach to test six widely us… ▽ More

    Submitted 16 March, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR 2024, code and link to project data are made available at https://github.com/epfl-dlab/LAMEN

  5. arXiv:2312.02073  [pdf, other

    cs.CL cs.AI cs.LG

    A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia

    Authors: Giovanni Monea, Maxime Peyrard, Martin Josifoski, Vishrav Chaudhary, Jason Eisner, Emre Kıcıman, Hamid Palangi, Barun Patra, Robert West

    Abstract: Large language models (LLMs) have an impressive ability to draw on novel information supplied in their context. Yet the mechanisms underlying this contextual grounding remain unknown, especially in situations where contextual information contradicts factual knowledge stored in the parameters, which LLMs also excel at recalling. Favoring the contextual information is critical for retrieval-augmente… ▽ More

    Submitted 10 June, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Accepted at ACL 2024 (main conference)

  6. arXiv:2308.01285  [pdf, other

    cs.AI cs.HC

    Flows: Building Blocks of Reasoning and Collaborating AI

    Authors: Martin Josifoski, Lars Klein, Maxime Peyrard, Nicolas Baldwin, Yifei Li, Saibo Geng, Julian Paul Schnitzler, Yuxing Yao, Jiheng Wei, Debjit Paul, Robert West

    Abstract: Recent advances in artificial intelligence (AI) have produced highly capable and controllable systems. This creates unprecedented opportunities for structured reasoning as well as collaboration among multiple AI systems and humans. To fully realize this potential, it is essential to develop a principled way of designing and studying such structured interactions. For this purpose, we introduce the… ▽ More

    Submitted 7 February, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

  7. arXiv:2305.15041  [pdf, other

    cs.CL

    Generating Faithful Synthetic Data with Large Language Models: A Case Study in Computational Social Science

    Authors: Veniamin Veselovsky, Manoel Horta Ribeiro, Akhil Arora, Martin Josifoski, Ashton Anderson, Robert West

    Abstract: Large Language Models (LLMs) have democratized synthetic data generation, which in turn has the potential to simplify and broaden a wide gamut of NLP tasks. Here, we tackle a pervasive problem in synthetic data generation: its generative distribution often differs from the distribution of real-world data researchers care about (in other words, it is unfaithful). In a case study on sarcasm detectio… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: 8 pages

  8. arXiv:2305.13971  [pdf, other

    cs.CL cs.AI cs.LG

    Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning

    Authors: Saibo Geng, Martin Josifoski, Maxime Peyrard, Robert West

    Abstract: Despite their impressive performance, large language models (LMs) still struggle with reliably generating complex output structures when not finetuned to follow the required output format exactly. To address this issue, grammar-constrained decoding (GCD) can be used to control the generation of LMs, guaranteeing that the output follows a given structure. Most existing GCD methods are, however, lim… ▽ More

    Submitted 18 January, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted at EMNLP 2023 Main Conference

  9. arXiv:2303.04132  [pdf, other

    cs.CL cs.AI cs.LG

    Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction

    Authors: Martin Josifoski, Marija Sakota, Maxime Peyrard, Robert West

    Abstract: Large language models (LLMs) have great potential for synthetic data generation. This work shows that useful data can be synthetically generated even for tasks that cannot be solved directly by LLMs: for problems with structured outputs, it is possible to prompt an LLM to perform the task in the reverse direction, by generating plausible input text for a target output structure. Leveraging this as… ▽ More

    Submitted 29 October, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: Accepted at EMNLP 2023

  10. arXiv:2211.07206  [pdf, other

    stat.ML cs.LG

    Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior: From Theory to Practice

    Authors: Jonas Rothfuss, Martin Josifoski, Vincent Fortuin, Andreas Krause

    Abstract: Meta-Learning aims to speed up the learning process on new tasks by acquiring useful inductive biases from datasets of related learning tasks. While, in practice, the number of related tasks available is often small, most of the existing approaches assume an abundance of tasks; making them unrealistic and prone to overfitting. A central question in the meta-learning literature is how to regularize… ▽ More

    Submitted 22 December, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: JMLR, 62 pages, text overlap with arXiv:2002.05551

    Journal ref: Journal of Machine Learning Research (24), 2023, 1-62

  11. arXiv:2210.07228  [pdf, other

    cs.CL cs.LG

    Language Model Decoding as Likelihood-Utility Alignment

    Authors: Martin Josifoski, Maxime Peyrard, Frano Rajic, Jiheng Wei, Debjit Paul, Valentin Hartmann, Barun Patra, Vishrav Chaudhary, Emre Kıcıman, Boi Faltings, Robert West

    Abstract: A critical component of a successful language generation pipeline is the decoding algorithm. However, the general principles that should guide the choice of a decoding algorithm remain unclear. Previous works only compare decoding algorithms in narrow scenarios, and their findings do not generalize across tasks. We argue that the misalignment between the model's likelihood and the task-specific no… ▽ More

    Submitted 16 March, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted at EACL (Findings) 2023

  12. arXiv:2112.08340  [pdf, other

    cs.CL cs.LG stat.ML

    GenIE: Generative Information Extraction

    Authors: Martin Josifoski, Nicola De Cao, Maxime Peyrard, Fabio Petroni, Robert West

    Abstract: Structured and grounded representation of text is typically formalized by closed information extraction, the problem of extracting an exhaustive set of (subject, relation, object) triplets that are consistent with a predefined set of entities and relations from a knowledge base schema. Most existing works are pipelines prone to error accumulation, and all approaches are only applicable to unrealis… ▽ More

    Submitted 13 April, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: Accepted at NAACL 2022

  13. arXiv:2110.08413  [pdf, other

    cs.CL cs.LG

    Invariant Language Modeling

    Authors: Maxime Peyrard, Sarvjeet Singh Ghotra, Martin Josifoski, Vidhan Agarwal, Barun Patra, Dean Carignan, Emre Kiciman, Robert West

    Abstract: Large pretrained language models are critical components of modern NLP pipelines. Yet, they suffer from spurious correlations, poor out-of-domain generalization, and biases. Inspired by recent progress in causal machine learning, in particular the invariant risk minimization (IRM) paradigm, we propose invariant language modeling, a framework for learning invariant representations that generalize b… ▽ More

    Submitted 14 November, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: Published at EMNLP 2022

  14. arXiv:2002.05551  [pdf, other

    stat.ML cs.LG

    PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees

    Authors: Jonas Rothfuss, Vincent Fortuin, Martin Josifoski, Andreas Krause

    Abstract: Meta-learning can successfully acquire useful inductive biases from data. Yet, its generalization properties to unseen learning tasks are poorly understood. Particularly if the number of meta-training tasks is small, this raises concerns about overfitting. We provide a theoretical analysis using the PAC-Bayesian framework and derive novel generalization bounds for meta-learning. Using these bounds… ▽ More

    Submitted 18 June, 2021; v1 submitted 13 February, 2020; originally announced February 2020.

    Comments: International Conference on Machine Learning (ICML) 2021

    MSC Class: 68Q32

  15. arXiv:1911.03814  [pdf, other

    cs.CL

    Scalable Zero-shot Entity Linking with Dense Entity Retrieval

    Authors: Ledell Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, Luke Zettlemoyer

    Abstract: This paper introduces a conceptually simple, scalable, and highly effective BERT-based entity linking model, along with an extensive evaluation of its accuracy-speed trade-off. We present a two-stage zero-shot linking algorithm, where each entity is defined only by a short textual description. The first stage does retrieval in a dense space defined by a bi-encoder that independently embeds the men… ▽ More

    Submitted 29 September, 2020; v1 submitted 9 November, 2019; originally announced November 2019.

    Comments: accepted at EMNLP 2020

  16. Crosslingual Document Embedding as Reduced-Rank Ridge Regression

    Authors: Martin Josifoski, Ivan S. Paskov, Hristo S. Paskov, Martin Jaggi, Robert West

    Abstract: There has recently been much interest in extending vector-based word representations to multiple languages, such that words can be compared across languages. In this paper, we shift the focus from words to documents and introduce a method for embedding documents written in any language into a single, language-independent vector space. For training, our approach leverages a multilingual corpus wher… ▽ More

    Submitted 8 April, 2019; originally announced April 2019.

    Comments: In The Twelfth ACM International Conference on Web Search and Data Mining (WSDM '19)