Search | arXiv e-print repository

Waste Not, Want Not; Recycled Gumbel Noise Improves Consistency in Natural Language Generation

Authors: Damien de Mijolla, Hannan Saddiq, Kim Moore

Abstract: Consistency in the output of language models is critical for their reliability and practical utility. Due to their training objective, language models learn to model the full space of possible continuations, leading to outputs that can vary significantly in style and content, even for similar or repeated inputs. To address this, we propose a novel decoding algorithm that enhances response consiste… ▽ More Consistency in the output of language models is critical for their reliability and practical utility. Due to their training objective, language models learn to model the full space of possible continuations, leading to outputs that can vary significantly in style and content, even for similar or repeated inputs. To address this, we propose a novel decoding algorithm that enhances response consistency across different prompts with no degradation in response quality. By incorporating a latent variable into the next-token sampling process based on the Gumbel reparametrisation trick, our method outperforms standard sampling by up to 10% across semantic and stylistic consistency benchmarks. Additionally, our approach integrates seamlessly with existing sampling methods with negligible computational overhead, providing a practical solution for improving the reliability of language model outputs. △ Less

Submitted 2 March, 2025; originally announced March 2025.

Comments: Accepted to NAACL 2025 main conference; Presented at Neurips Safe Generative AI workshop

arXiv:2412.05967 [pdf, other]

Language hooks: a modular framework for augmenting LLM reasoning that decouples tool usage from the model and its prompt

Authors: Damien de Mijolla, Wen Yang, Philippa Duckett, Christopher Frye, Mark Worrall

Abstract: Prompting and fine-tuning have emerged as two competing paradigms for augmenting language models with new capabilities, such as the use of tools. Prompting approaches are quick to set up but rely on providing explicit demonstrations of each tool's usage in the model's prompt, thus coupling tool use to the task at hand and limiting generalisation. Fine-tuning removes the need for task-specific demo… ▽ More Prompting and fine-tuning have emerged as two competing paradigms for augmenting language models with new capabilities, such as the use of tools. Prompting approaches are quick to set up but rely on providing explicit demonstrations of each tool's usage in the model's prompt, thus coupling tool use to the task at hand and limiting generalisation. Fine-tuning removes the need for task-specific demonstrations of tool usage at runtime; however, this ties new capabilities to a single model, thus making already-heavier setup costs a recurring expense. In this paper, we introduce language hooks, a novel framework for augmenting language models with new capabilities that is decoupled both from the model's task-specific prompt and from the model itself. The language hook algorithm interleaves text generation by the base model with the execution of modular programs that trigger conditionally based on the existing text and the available capabilities. Upon triggering, programs may call external tools, auxiliary language models (e.g. using tool specific prompts), and modify the existing context. We benchmark our method against state-of-the-art baselines, find that it outperforms task-aware approaches, and demonstrate its ability to generalise to novel tasks. △ Less

Submitted 8 December, 2024; originally announced December 2024.

Comments: This work was conducted during Summer 2023. Experimental results and references reflect the state of the field at that time and may not account for subsequent developments

arXiv:2110.02250 [pdf, other]

Measuring chemical likeness of stars with RSCA

Authors: Damien de Mijolla, Melissa K. Ness

Abstract: Identification of chemically similar stars using elemental abundances is core to many pursuits within Galactic archaeology. However, measuring the chemical likeness of stars using abundances directly is limited by systematic imprints of imperfect synthetic spectra in abundance derivation. We present a novel data-driven model that is capable of identifying chemically similar stars from spectra alon… ▽ More Identification of chemically similar stars using elemental abundances is core to many pursuits within Galactic archaeology. However, measuring the chemical likeness of stars using abundances directly is limited by systematic imprints of imperfect synthetic spectra in abundance derivation. We present a novel data-driven model that is capable of identifying chemically similar stars from spectra alone. We call this Relevant Scaled Component Analysis (RSCA). RSCA finds a mapping from stellar spectra to a representation that optimizes recovery of known open clusters. By design, RSCA amplifies factors of chemical abundance variation and minimizes those of non-chemical parameters, such as instrument systematics. The resultant representation of stellar spectra can therefore be used for precise measurements of chemical similarity between stars. We validate RSCA using 185 cluster stars in 22 open clusters in the APOGEE survey. We quantify our performance in measuring chemical similarity using a reference set of 151,145 field stars. We find that our representation identifies known stellar siblings more effectively than stellar abundance measurements. Using RSCA, 1.8% of pairs of field stars are as similar as birth siblings, compared to 2.3% when using stellar abundance labels. We find that almost all of the information within spectra leveraged by RSCA fits into a two-dimensional basis, which we link to [Fe/H] and alpha-element abundances. We conclude that chemical tagging of stars to their birth clusters remains prohibitive. However, using the spectra has noticeable gain, and our approach is poised to benefit from larger datasets and improved algorithm designs. △ Less

Submitted 5 October, 2021; originally announced October 2021.

Comments: submitted to ApJ, 16 pages, code:https://github.com/drd13/RSCA

arXiv:2103.06377 [pdf, other]

doi 10.3847/1538-4357/abece1

Disentangled Representation Learning for Astronomical Chemical Tagging

Authors: Damien de Mijolla, Melissa Ness, Serena Viti, Adam Wheeler

Abstract: Modern astronomical surveys are observing spectral data for millions of stars. These spectra contain chemical information that can be used to trace the Galaxy's formation and chemical enrichment history. However, extracting the information from spectra, and making precise and accurate chemical abundance measurements are challenging. Here, we present a data-driven method for isolating the chemical… ▽ More Modern astronomical surveys are observing spectral data for millions of stars. These spectra contain chemical information that can be used to trace the Galaxy's formation and chemical enrichment history. However, extracting the information from spectra, and making precise and accurate chemical abundance measurements are challenging. Here, we present a data-driven method for isolating the chemical factors of variation in stellar spectra from those of other parameters (i.e. \teff, \logg, \feh). This enables us to build a spectral projection for each star with these parameters removed. We do this with no ab initio knowledge of elemental abundances themselves, and hence bypass the uncertainties and systematics associated with modeling that rely on synthetic stellar spectra. To remove known non-chemical factors of variation, we develop and implement a neural network architecture that learns a disentangled spectral representation. We simulate our recovery of chemically identical stars using the disentangled spectra in a synthetic APOGEE-like dataset. We show that this recovery declines as a function of the signal to noise ratio, but that our neural network architecture outperforms simpler modeling choices. Our work demonstrates the feasibility of data-driven abundance-free chemical tagging. △ Less

Submitted 10 March, 2021; originally announced March 2021.

Comments: Accepted by ApJ. code available at github.com/drd13/tagging-package

arXiv:2010.07384 [pdf, other]

Human-interpretable model explainability on high-dimensional data

Authors: Damien de Mijolla, Christopher Frye, Markus Kunesch, John Mansir, Ilya Feige

Abstract: The importance of explainability in machine learning continues to grow, as both neural-network architectures and the data they model become increasingly complex. Unique challenges arise when a model's input features become high dimensional: on one hand, principled model-agnostic approaches to explainability become too computationally expensive; on the other, more efficient explainability algorithm… ▽ More The importance of explainability in machine learning continues to grow, as both neural-network architectures and the data they model become increasingly complex. Unique challenges arise when a model's input features become high dimensional: on one hand, principled model-agnostic approaches to explainability become too computationally expensive; on the other, more efficient explainability algorithms lack natural interpretations for general users. In this work, we introduce a framework for human-interpretable explainability on high-dimensional data, consisting of two modules. First, we apply a semantically meaningful latent representation, both to reduce the raw dimensionality of the data, and to ensure its human interpretability. These latent features can be learnt, e.g. explicitly as disentangled representations or implicitly through image-to-image translation, or they can be based on any computable quantities the user chooses. Second, we adapt the Shapley paradigm for model-agnostic explainability to operate on these latent features. This leads to interpretable model explanations that are both theoretically controlled and computationally tractable. We benchmark our approach on synthetic data and demonstrate its effectiveness on several image-classification tasks. △ Less

Submitted 20 December, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

Comments: 8 pages, 6 figures, 1 appendix

arXiv:2006.01272 [pdf, other]

Shapley explainability on the data manifold

Authors: Christopher Frye, Damien de Mijolla, Tom Begley, Laurence Cowton, Megan Stanley, Ilya Feige

Abstract: Explainability in AI is crucial for model development, compliance with regulation, and providing operational nuance to predictions. The Shapley framework for explainability attributes a model's predictions to its input features in a mathematically principled and model-agnostic way. However, general implementations of Shapley explainability make an untenable assumption: that the model's features ar… ▽ More Explainability in AI is crucial for model development, compliance with regulation, and providing operational nuance to predictions. The Shapley framework for explainability attributes a model's predictions to its input features in a mathematically principled and model-agnostic way. However, general implementations of Shapley explainability make an untenable assumption: that the model's features are uncorrelated. In this work, we demonstrate unambiguous drawbacks of this assumption and develop two solutions to Shapley explainability that respect the data manifold. One solution, based on generative modelling, provides flexible access to data imputations; the other directly learns the Shapley value-function, providing performance and stability at the cost of flexibility. While "off-manifold" Shapley values can (i) give rise to incorrect explanations, (ii) hide implicit model dependence on sensitive attributes, and (iii) lead to unintelligible explanations in higher-dimensional data, on-manifold explainability overcomes these problems. △ Less

Submitted 20 December, 2021; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: To appear in ICLR 2021; 9 pages, 6 figures, 2 appendices

Showing 1–6 of 6 results for author: de Mijolla, D