Skip to main content

Showing 1–7 of 7 results for author: Conklin, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.23960  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Information Structure in Mappings: An Approach to Learning, Representation, and Generalisation

    Authors: Henry Conklin

    Abstract: Despite the remarkable success of large large-scale neural networks, we still lack unified notation for thinking about and describing their representational spaces. We lack methods to reliably describe how their representations are structured, how that structure emerges over training, and what kinds of structures are desirable. This thesis introduces quantitative methods for identifying systematic… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: PhD Thesis, 204 pages; entropy estimation discussed from p.94

  2. arXiv:2505.13737  [pdf, ps, other

    cs.AI

    Causal Head Gating: A Framework for Interpreting Roles of Attention Heads in Transformers

    Authors: Andrew Nam, Henry Conklin, Yukang Yang, Thomas Griffiths, Jonathan Cohen, Sarah-Jane Leslie

    Abstract: We present causal head gating (CHG), a scalable method for interpreting the functional roles of attention heads in transformer models. CHG learns soft gates over heads and assigns them a causal taxonomy - facilitating, interfering, or irrelevant - based on their impact on task performance. Unlike prior approaches in mechanistic interpretability, which are hypothesis-driven and require prompt templ… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 10 pages, 5 figures, 2 tables

  3. arXiv:2504.00698  [pdf

    cs.CL cs.AI cs.LG

    Command A: An Enterprise-Ready Large Language Model

    Authors: Team Cohere, :, Aakanksha, Arash Ahmadian, Marwan Ahmed, Jay Alammar, Milad Alizadeh, Yazeed Alnumay, Sophia Althammer, Arkady Arkhangorodsky, Viraat Aryabumi, Dennis Aumiller, Raphaël Avalos, Zahara Aviv, Sammie Bae, Saurabh Baji, Alexandre Barbet, Max Bartolo, Björn Bebensee, Neeral Beladia, Walter Beller-Morales, Alexandre Bérard, Andrew Berneshawi, Anna Bialas, Phil Blunsom , et al. (205 additional authors not shown)

    Abstract: In this report we describe the development of Command A, a powerful large language model purpose-built to excel at real-world enterprise use cases. Command A is an agent-optimised and multilingual-capable model, with support for 23 languages of global business, and a novel hybrid architecture balancing efficiency with top of the range performance. It offers best-in-class Retrieval Augmented Genera… ▽ More

    Submitted 14 April, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

    Comments: 55 pages

  4. arXiv:2412.14076  [pdf, other

    cs.AI cs.CL

    Compositional Generalization Across Distributional Shifts with Sparse Tree Operations

    Authors: Paul Soulos, Henry Conklin, Mattia Opper, Paul Smolensky, Jianfeng Gao, Roland Fernandez

    Abstract: Neural networks continue to struggle with compositional generalization, and this issue is exacerbated by a lack of massive pre-training. One successful approach for developing neural systems which exhibit human-like compositional generalization is \textit{hybrid} neurosymbolic techniques. However, these techniques run into the core issues that plague symbolic approaches to AI: scalability and flex… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: NeurIPS 2024. Code available at https://github.com/psoulos/sdtm

  5. arXiv:2406.02449  [pdf, other

    cs.CL cs.AI

    Representations as Language: An Information-Theoretic Framework for Interpretability

    Authors: Henry Conklin, Kenny Smith

    Abstract: Large scale neural models show impressive performance across a wide array of linguistic tasks. Despite this they remain, largely, black-boxes - inducing vector-representations of their input that prove difficult to interpret. This limits our ability to understand what they learn, and when the learn it, or describe what kinds of representations generalise well out of distribution. To address this w… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 6 pages, 3 Figures

  6. arXiv:2308.07984  [pdf, other

    cs.CL

    Anaphoric Structure Emerges Between Neural Networks

    Authors: Nicholas Edwards, Hannah Rohde, Henry Conklin

    Abstract: Pragmatics is core to natural language, enabling speakers to communicate efficiently with structures like ellipsis and anaphora that can shorten utterances without loss of meaning. These structures require a listener to interpret an ambiguous form - like a pronoun - and infer the speaker's intended meaning - who that pronoun refers to. Despite potential to introduce ambiguity, anaphora is ubiquito… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: Published as a conference paper at the Annual Meeting of the Cognitive Science Society 2023: 6 Pages, 3 Figures, code available at https://github.com/hcoxec/emerge

  7. arXiv:2106.04252  [pdf, other

    cs.CL

    Meta-Learning to Compositionally Generalize

    Authors: Henry Conklin, Bailin Wang, Kenny Smith, Ivan Titov

    Abstract: Natural language is compositional; the meaning of a sentence is a function of the meaning of its parts. This property allows humans to create and interpret novel sentences, generalizing robustly outside their prior experience. Neural networks have been shown to struggle with this kind of generalization, in particular performing poorly on tasks designed to assess compositional generalization (i.e.… ▽ More

    Submitted 29 June, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: ACL2021 Camera Ready; fix a small typo