Search | arXiv e-print repository

What's in a prompt? Language models encode literary style in prompt embeddings

Authors: Raphaël Sarfati, Haley Moller, Toni J. B. Liu, Nicolas Boullé, Christopher Earls

Abstract: Large language models use high-dimensional latent spaces to encode and process textual information. Much work has investigated how the conceptual content of words translates into geometrical relationships between their vector representations. Fewer studies analyze how the cumulative information of an entire prompt becomes condensed into individual embeddings under the action of transformer layers.… ▽ More Large language models use high-dimensional latent spaces to encode and process textual information. Much work has investigated how the conceptual content of words translates into geometrical relationships between their vector representations. Fewer studies analyze how the cumulative information of an entire prompt becomes condensed into individual embeddings under the action of transformer layers. We use literary pieces to show that information about intangible, rather than factual, aspects of the prompt are contained in deep representations. We observe that short excerpts (10 - 100 tokens) from different novels separate in the latent space independently from what next-token prediction they converge towards. Ensembles from books from the same authors are much more entangled than across authors, suggesting that embeddings encode stylistic features. This geometry of style may have applications for authorship attribution and literary analysis, but most importantly reveals the sophistication of information processing and compression accomplished by language models. △ Less

Submitted 19 May, 2025; originally announced May 2025.

arXiv:2504.17065 [pdf, other]

Antenna Near-Field Reconstruction from Far-Field Data Using Convolutional Neural Networks

Authors: Sahar Bagherkhani, Jackson Christopher Earls, Franco De Flaviis, Pierre Baldi

Abstract: Electromagnetic field reconstruction is crucial in many applications, including antenna diagnostics, electromagnetic interference analysis, and system modeling. This paper presents a deep learning-based approach for Far-Field to Near-Field (FF-NF) transformation using Convolutional Neural Networks (CNNs). The goal is to reconstruct near-field distributions from the far-field data of an antenna wit… ▽ More Electromagnetic field reconstruction is crucial in many applications, including antenna diagnostics, electromagnetic interference analysis, and system modeling. This paper presents a deep learning-based approach for Far-Field to Near-Field (FF-NF) transformation using Convolutional Neural Networks (CNNs). The goal is to reconstruct near-field distributions from the far-field data of an antenna without relying on explicit analytical transformations. The CNNs are trained on paired far-field and near-field data and evaluated using mean squared error (MSE). The best model achieves a training error of 0.0199 and a test error of 0.3898. Moreover, visual comparisons between the predicted and true near-field distributions demonstrate the model's effectiveness in capturing complex electromagnetic field behavior, highlighting the potential of deep learning in electromagnetic field reconstruction. △ Less

Submitted 23 April, 2025; originally announced April 2025.

arXiv:2501.18715 [pdf, other]

chebgreen: Learning and Interpolating Continuous Empirical Green's Functions from Data

Authors: Harshwardhan Praveen, Jacob Brown, Christopher Earls

Abstract: In this work, we present a mesh-independent, data-driven library, chebgreen, to mathematically model one-dimensional systems, possessing an associated control parameter, and whose governing partial differential equation is unknown. The proposed method learns an Empirical Green's Function for the associated, but hidden, boundary value problem, in the form of a Rational Neural Network from which we… ▽ More In this work, we present a mesh-independent, data-driven library, chebgreen, to mathematically model one-dimensional systems, possessing an associated control parameter, and whose governing partial differential equation is unknown. The proposed method learns an Empirical Green's Function for the associated, but hidden, boundary value problem, in the form of a Rational Neural Network from which we subsequently construct a bivariate representation in a Chebyshev basis. We uncover the Green's function, at an unseen control parameter value, by interpolating the left and right singular functions within a suitable library, expressed as points on a manifold of Quasimatrices, while the associated singular values are interpolated with Lagrange polynomials. △ Less

Submitted 2 May, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

Comments: Code is available at https://github.com/hsharsh/chebgreen

arXiv:2412.15113 [pdf, other]

Associative memory inspires improvements for in-context learning using a novel attention residual stream architecture

Authors: Thomas F Burns, Tomoki Fukai, Christopher J Earls

Abstract: Large language models (LLMs) demonstrate an impressive ability to utilise information within the context of their input sequences to appropriately respond to data unseen by the LLM during its training procedure. This ability is known as in-context learning (ICL). Humans and non-human animals demonstrate similar abilities, however their neural architectures differ substantially from LLMs. Despite t… ▽ More Large language models (LLMs) demonstrate an impressive ability to utilise information within the context of their input sequences to appropriately respond to data unseen by the LLM during its training procedure. This ability is known as in-context learning (ICL). Humans and non-human animals demonstrate similar abilities, however their neural architectures differ substantially from LLMs. Despite this, a critical component within LLMs, the attention mechanism, resembles modern associative memory models, widely used in and influenced by the computational neuroscience community to model biological memory systems. Using this connection, we introduce an associative memory model capable of performing ICL. We use this as inspiration for a novel residual stream architecture which allows information to directly flow between attention heads. We test this architecture during training within a two-layer Transformer and show its ICL abilities manifest more quickly than without this modification. We then apply our architecture in small language models with 8 million parameters, focusing on attention head values, with results also indicating improved ICL performance at this larger and more naturalistic scale. △ Less

Submitted 19 December, 2024; originally announced December 2024.

Comments: 18 pages, 6 figures, 3 tables

MSC Class: 92B20; 68T01; 68T37; 68T50 ACM Class: I.2; I.5; I.7; J.2; J.3

arXiv:2410.05218 [pdf, other]

Density estimation with LLMs: a geometric investigation of in-context learning trajectories

Authors: Toni J. B. Liu, Nicolas Boullé, Raphaël Sarfati, Christopher J. Earls

Abstract: Large language models (LLMs) demonstrate remarkable emergent abilities to perform in-context learning across various tasks, including time series forecasting. This work investigates LLMs' ability to estimate probability density functions (PDFs) from data observed in-context; such density estimation (DE) is a fundamental task underlying many probabilistic modeling problems. We leverage the Intensiv… ▽ More Large language models (LLMs) demonstrate remarkable emergent abilities to perform in-context learning across various tasks, including time series forecasting. This work investigates LLMs' ability to estimate probability density functions (PDFs) from data observed in-context; such density estimation (DE) is a fundamental task underlying many probabilistic modeling problems. We leverage the Intensive Principal Component Analysis (InPCA) to visualize and analyze the in-context learning dynamics of LLaMA-2 models. Our main finding is that these LLMs all follow similar learning trajectories in a low-dimensional InPCA space, which are distinct from those of traditional density estimation methods like histograms and Gaussian kernel density estimation (KDE). We interpret the LLaMA in-context DE process as a KDE with an adaptive kernel width and shape. This custom kernel model captures a significant portion of LLaMA's behavior despite having only two parameters. We further speculate on why LLaMA's kernel width and shape differs from classical algorithms, providing insights into the mechanism of in-context probabilistic reasoning in LLMs. Our codebase, along with a 3D visualization of an LLM's in-context learning trajectory, is publicly available at https://github.com/AntonioLiu97/LLMICL_inPCA △ Less

Submitted 3 March, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

arXiv:2410.01545 [pdf, other]

Lines of Thought in Large Language Models

Authors: Raphaël Sarfati, Toni J. B. Liu, Nicolas Boullé, Christopher J. Earls

Abstract: Large Language Models achieve next-token prediction by transporting a vectorized piece of text (prompt) across an accompanying embedding space under the action of successive transformer layers. The resulting high-dimensional trajectories realize different contextualization, or 'thinking', steps, and fully determine the output probability distribution. We aim to characterize the statistical propert… ▽ More Large Language Models achieve next-token prediction by transporting a vectorized piece of text (prompt) across an accompanying embedding space under the action of successive transformer layers. The resulting high-dimensional trajectories realize different contextualization, or 'thinking', steps, and fully determine the output probability distribution. We aim to characterize the statistical properties of ensembles of these 'lines of thought.' We observe that independent trajectories cluster along a low-dimensional, non-Euclidean manifold, and that their path can be well approximated by a stochastic equation with few parameters extracted from data. We find it remarkable that the vast complexity of such large models can be reduced to a much simpler form, and we reflect on implications. △ Less

Submitted 13 February, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

arXiv:2402.00795 [pdf, other]

LLMs learn governing principles of dynamical systems, revealing an in-context neural scaling law

Authors: Toni J. B. Liu, Nicolas Boullé, Raphaël Sarfati, Christopher J. Earls

Abstract: Pretrained large language models (LLMs) are surprisingly effective at performing zero-shot tasks, including time-series forecasting. However, understanding the mechanisms behind such capabilities remains highly challenging due to the complexity of the models. We study LLMs' ability to extrapolate the behavior of dynamical systems whose evolution is governed by principles of physical interest. Our… ▽ More Pretrained large language models (LLMs) are surprisingly effective at performing zero-shot tasks, including time-series forecasting. However, understanding the mechanisms behind such capabilities remains highly challenging due to the complexity of the models. We study LLMs' ability to extrapolate the behavior of dynamical systems whose evolution is governed by principles of physical interest. Our results show that LLaMA 2, a language model trained primarily on texts, achieves accurate predictions of dynamical system time series without fine-tuning or prompt engineering. Moreover, the accuracy of the learned physical rules increases with the length of the input context window, revealing an in-context version of neural scaling law. Along the way, we present a flexible and efficient algorithm for extracting probability density functions of multi-digit numbers directly from LLMs. △ Less

Submitted 9 October, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2309.04699 [pdf, other]

Weak-PDE-LEARN: A Weak Form Based Approach to Discovering PDEs From Noisy, Limited Data

Authors: Robert Stephany, Christopher Earls

Abstract: We introduce Weak-PDE-LEARN, a Partial Differential Equation (PDE) discovery algorithm that can identify non-linear PDEs from noisy, limited measurements of their solutions. Weak-PDE-LEARN uses an adaptive loss function based on weak forms to train a neural network, $U$, to approximate the PDE solution while simultaneously identifying the governing PDE. This approach yields an algorithm that is ro… ▽ More We introduce Weak-PDE-LEARN, a Partial Differential Equation (PDE) discovery algorithm that can identify non-linear PDEs from noisy, limited measurements of their solutions. Weak-PDE-LEARN uses an adaptive loss function based on weak forms to train a neural network, $U$, to approximate the PDE solution while simultaneously identifying the governing PDE. This approach yields an algorithm that is robust to noise and can discover a range of PDEs directly from noisy, limited measurements of their solutions. We demonstrate the efficacy of Weak-PDE-LEARN by learning several benchmark PDEs. △ Less

Submitted 9 September, 2023; originally announced September 2023.

Comments: 29 pages, 8 figures

arXiv:2212.04971 [pdf, other]

PDE-LEARN: Using Deep Learning to Discover Partial Differential Equations from Noisy, Limited Data

Authors: Robert Stephany, Christopher Earls

Abstract: In this paper, we introduce PDE-LEARN, a novel deep learning algorithm that can identify governing partial differential equations (PDEs) directly from noisy, limited measurements of a physical system of interest. PDE-LEARN uses a Rational Neural Network, $U$, to approximate the system response function and a sparse, trainable vector, $ξ$, to characterize the hidden PDE that the system response fun… ▽ More In this paper, we introduce PDE-LEARN, a novel deep learning algorithm that can identify governing partial differential equations (PDEs) directly from noisy, limited measurements of a physical system of interest. PDE-LEARN uses a Rational Neural Network, $U$, to approximate the system response function and a sparse, trainable vector, $ξ$, to characterize the hidden PDE that the system response function satisfies. Our approach couples the training of $U$ and $ξ$ using a loss function that (1) makes $U$ approximate the system response function, (2) encapsulates the fact that $U$ satisfies a hidden PDE that $ξ$ characterizes, and (3) promotes sparsity in $ξ$ using ideas from iteratively reweighted least-squares. Further, PDE-LEARN can simultaneously learn from several data sets, allowing it to incorporate results from multiple experiments. This approach yields a robust algorithm to discover PDEs directly from realistic scientific data. We demonstrate the efficacy of PDE-LEARN by identifying several PDEs from noisy and limited measurements. △ Less

Submitted 9 February, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

Comments: 25 pages, 7 figures, 9 tables

arXiv:2111.00998 [pdf, other]

PDE-READ: Human-readable Partial Differential Equation Discovery using Deep Learning

Authors: Robert Stephany, Christopher Earls

Abstract: PDE discovery shows promise for uncovering predictive models of complex physical systems but has difficulty when measurements are sparse and noisy. We introduce a new approach for PDE discovery that uses two Rational Neural Networks and a principled sparse regression algorithm to identify the hidden dynamics that govern a system's response. The first network learns the system response function, wh… ▽ More PDE discovery shows promise for uncovering predictive models of complex physical systems but has difficulty when measurements are sparse and noisy. We introduce a new approach for PDE discovery that uses two Rational Neural Networks and a principled sparse regression algorithm to identify the hidden dynamics that govern a system's response. The first network learns the system response function, while the second learns a hidden PDE describing the system's evolution. We then use a parameter-free sparse regression algorithm to extract a human-readable form of the hidden PDE from the second network. We implement our approach in an open-source library called PDE-READ. Our approach successfully identifies the governing PDE in six benchmark examples. We demonstrate that our approach is robust to both sparsity and noise and it, therefore, holds promise for application to real-world observational data. △ Less

Submitted 17 June, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: 41 pages, 18 figures

arXiv:2108.04085 [pdf, other]

Bayesian Deep Learning for Partial Differential Equation Parameter Discovery with Sparse and Noisy Data

Authors: Christophe Bonneville, Christopher J. Earls

Abstract: Scientific machine learning has been successfully applied to inverse problems and PDE discovery in computational physics. One caveat concerning current methods is the need for large amounts of ("clean") data, in order to characterize the full system response and discover underlying physical models. Bayesian methods may be particularly promising for overcoming these challenges, as they are naturall… ▽ More Scientific machine learning has been successfully applied to inverse problems and PDE discovery in computational physics. One caveat concerning current methods is the need for large amounts of ("clean") data, in order to characterize the full system response and discover underlying physical models. Bayesian methods may be particularly promising for overcoming these challenges, as they are naturally less sensitive to the negative effects of sparse and noisy data. In this paper, we propose to use Bayesian neural networks (BNN) in order to: 1) Recover the full system states from measurement data (e.g. temperature, velocity field, etc.). We use Hamiltonian Monte-Carlo to sample the posterior distribution of a deep and dense BNN, and show that it is possible to accurately capture physics of varying complexity, without overfitting. 2) Recover the parameters instantiating the underlying partial differential equation (PDE) governing the physical system. Using the trained BNN, as a surrogate of the system response, we generate datasets of derivatives that are potentially comprising the latent PDE governing the observed system and then perform a sequential threshold Bayesian linear regression (STBLR), between the successive derivatives in space and time, to recover the original PDE parameters. We take advantage of the confidence intervals within the BNN outputs, and introduce the spatial derivatives cumulative variance into the STBLR likelihood, to mitigate the influence of highly uncertain derivative data points; thus allowing for more accurate parameter discovery. We demonstrate our approach on a handful of example, in applied physics and non-linear dynamics. △ Less

Submitted 29 November, 2021; v1 submitted 5 August, 2021; originally announced August 2021.

arXiv:2105.00266 [pdf, other]

doi 10.1038/s41598-022-08745-5

Data-driven discovery of Green's functions with human-understandable deep learning

Authors: Nicolas Boullé, Christopher J. Earls, Alex Townsend

Abstract: There is an opportunity for deep learning to revolutionize science and technology by revealing its findings in a human interpretable manner. To do this, we develop a novel data-driven approach for creating a human-machine partnership to accelerate scientific discovery. By collecting physical system responses under excitations drawn from a Gaussian process, we train rational neural networks to lear… ▽ More There is an opportunity for deep learning to revolutionize science and technology by revealing its findings in a human interpretable manner. To do this, we develop a novel data-driven approach for creating a human-machine partnership to accelerate scientific discovery. By collecting physical system responses under excitations drawn from a Gaussian process, we train rational neural networks to learn Green's functions of hidden linear partial differential equations. These functions reveal human-understandable properties and features, such as linear conservation laws and symmetries, along with shock and singularity locations, boundary effects, and dominant modes. We illustrate the technique on several examples and capture a range of physics, including advection-diffusion, viscous shocks, and Stokes flow in a lid-driven cavity. △ Less

Submitted 11 March, 2022; v1 submitted 1 May, 2021; originally announced May 2021.

Comments: 54 pages, 23 figures

arXiv:2008.09687 [pdf, other]

doi 10.1016/j.finel.2021.103562

A Principled Approach to Design Using High Fidelity Fluid-Structure Interaction Simulations

Authors: Wensi Wu, Christophe Bonneville, Christopher J. Earls

Abstract: A high fidelity fluid-structure interaction simulation may require many days to run, on hundreds of cores. This poses a serious burden, both in terms of time and economic considerations, when repetitions of such simulations may be required (e.g. for the purpose of design optimization). In this paper we present strategies based on (constrained) Bayesian optimization (BO) to alleviate this burden. B… ▽ More A high fidelity fluid-structure interaction simulation may require many days to run, on hundreds of cores. This poses a serious burden, both in terms of time and economic considerations, when repetitions of such simulations may be required (e.g. for the purpose of design optimization). In this paper we present strategies based on (constrained) Bayesian optimization (BO) to alleviate this burden. BO is a numerical optimization technique based on Gaussian processes (GP) that is able to efficiently (with minimal calls to the expensive FSI models) converge towards some globally optimal design, as gauged using a black box objective function. In this study we present a principled design evolution that moves from FSI model verification, through a series of Bridge Simulations (bringing the verification case incrementally closer to the application), in order that we may identify material properties for an underwater, unmanned, autonomous vehicle (UUAV) sail plane. We are able to achieve fast convergence towards an optimal design, using a small number of FSI simulations (a dozen at most), even when selecting over several design parameters, and while respecting optimization constraints. △ Less

Submitted 21 August, 2020; originally announced August 2020.

arXiv:1905.07622 [pdf, other]

Analysis of heterogeneous computing approaches to simulating heat transfer in heterogeneous material

Authors: Andrew Loeb, Christopher Earls

Abstract: The simulation of heat flow through heterogeneous material is important for the design of structural and electronic components. Classical analytical solutions to the heat equation PDE are not known for many such domains, even those having simple geometries. The finite element method can provide approximations to a weak form continuum solution, with increasing accuracy as the number of degrees of f… ▽ More The simulation of heat flow through heterogeneous material is important for the design of structural and electronic components. Classical analytical solutions to the heat equation PDE are not known for many such domains, even those having simple geometries. The finite element method can provide approximations to a weak form continuum solution, with increasing accuracy as the number of degrees of freedom in the model increases. This comes at a cost of increased memory usage and computation time; even when taking advantage of sparse matrix techniques for the finite element system matrix. We summarize recent approaches in solving problems in structural mechanics and steady state heat conduction which do not require the explicit assembly of any system matrices, and adapt them to a method for solving the time-depended flow of heat. These approaches are highly parallelizable, and can be performed on graphical processing units (GPUs). Furthermore, they lend themselves to the simulation of heterogeneous material, with a minimum of added complexity. We present the mathematical framework of assembly-free FEM approaches, through which we summarize the benefits of GPU computation. We discuss our implementation using the OpenCL computing framework, and show how it is further adapted for use on multiple GPUs. We compare the performance of single and dual GPUs implementations of our method with previous GPU computing strategies from the literature and a CPU sparse matrix approach. The utility of the novel method is demonstrated through the solution of a real-world coefficient inverse problem that requires thousands of transient heat flow simulations, each of which involves solving a 1 million degree of freedom linear system over hundreds of time steps. △ Less

Submitted 18 May, 2019; originally announced May 2019.

Showing 1–14 of 14 results for author: Earls, C