-
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
Authors:
Daniel Kunin,
Giovanni Luca Marchetti,
Feng Chen,
Dhruva Karkada,
James B. Simon,
Michael R. DeWeese,
Surya Ganguli,
Nina Miolane
Abstract:
What features neural networks learn, and how, remains an open question. In this paper, we introduce Alternating Gradient Flows (AGF), an algorithmic framework that describes the dynamics of feature learning in two-layer networks trained from small initialization. Prior works have shown that gradient flow in this regime exhibits a staircase-like loss curve, alternating between plateaus where neuron…
▽ More
What features neural networks learn, and how, remains an open question. In this paper, we introduce Alternating Gradient Flows (AGF), an algorithmic framework that describes the dynamics of feature learning in two-layer networks trained from small initialization. Prior works have shown that gradient flow in this regime exhibits a staircase-like loss curve, alternating between plateaus where neurons slowly align to useful directions and sharp drops where neurons rapidly grow in norm. AGF approximates this behavior as an alternating two-step process: maximizing a utility function over dormant neurons and minimizing a cost function over active ones. AGF begins with all neurons dormant. At each round, a dormant neuron activates, triggering the acquisition of a feature and a drop in the loss. AGF quantifies the order, timing, and magnitude of these drops, matching experiments across architectures. We show that AGF unifies and extends existing saddle-to-saddle analyses in fully connected linear networks and attention-only linear transformers, where the learned features are singular modes and principal components, respectively. In diagonal linear networks, we prove AGF converges to gradient flow in the limit of vanishing initialization. Applying AGF to quadratic networks trained to perform modular addition, we give the first complete characterization of the training dynamics, revealing that networks learn Fourier features in decreasing order of coefficient magnitude. Altogether, AGF offers a promising step towards understanding feature learning in neural networks.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Personalizing Student-Agent Interactions Using Log-Contextualized Retrieval Augmented Generation (RAG)
Authors:
Clayton Cohn,
Surya Rayala,
Caitlin Snyder,
Joyce Fonteles,
Shruti Jain,
Naveeduddin Mohammed,
Umesh Timalsina,
Sarah K. Burriss,
Ashwin T S,
Namrata Srivastava,
Menton Deweese,
Angela Eeds,
Gautam Biswas
Abstract:
Collaborative dialogue offers rich insights into students' learning and critical thinking. This is essential for adapting pedagogical agents to students' learning and problem-solving skills in STEM+C settings. While large language models (LLMs) facilitate dynamic pedagogical interactions, potential hallucinations can undermine confidence, trust, and instructional value. Retrieval-augmented generat…
▽ More
Collaborative dialogue offers rich insights into students' learning and critical thinking. This is essential for adapting pedagogical agents to students' learning and problem-solving skills in STEM+C settings. While large language models (LLMs) facilitate dynamic pedagogical interactions, potential hallucinations can undermine confidence, trust, and instructional value. Retrieval-augmented generation (RAG) grounds LLM outputs in curated knowledge, but its effectiveness depends on clear semantic links between user input and a knowledge base, which are often weak in student dialogue. We propose log-contextualized RAG (LC-RAG), which enhances RAG retrieval by incorporating environment logs to contextualize collaborative discourse. Our findings show that LC-RAG improves retrieval over a discourse-only baseline and allows our collaborative peer agent, Copa, to deliver relevant, personalized guidance that supports students' critical thinking and epistemic decision-making in a collaborative computational modeling environment, XYZ.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Closed-Form Training Dynamics Reveal Learned Features and Linear Structure in Word2Vec-like Models
Authors:
Dhruva Karkada,
James B. Simon,
Yasaman Bahri,
Michael R. DeWeese
Abstract:
Self-supervised word embedding algorithms such as word2vec provide a minimal setting for studying representation learning in language modeling. We examine the quartic Taylor approximation of the word2vec loss around the origin, and we show that both the resulting training dynamics and the final performance on downstream tasks are empirically very similar to those of word2vec. Our main contribution…
▽ More
Self-supervised word embedding algorithms such as word2vec provide a minimal setting for studying representation learning in language modeling. We examine the quartic Taylor approximation of the word2vec loss around the origin, and we show that both the resulting training dynamics and the final performance on downstream tasks are empirically very similar to those of word2vec. Our main contribution is to analytically solve for both the gradient flow training dynamics and the final word embeddings in terms of only the corpus statistics and training hyperparameters. The solutions reveal that these models learn orthogonal linear subspaces one at a time, each one incrementing the effective rank of the embeddings until model capacity is saturated. Training on Wikipedia, we find that each of the top linear subspaces represents an interpretable topic-level concept. Finally, we apply our theory to describe how linear representations of more abstract semantic concepts emerge during training; these can be used to complete analogies via vector addition.
△ Less
Submitted 28 May, 2025; v1 submitted 13 February, 2025;
originally announced February 2025.
-
Beyond Linear Response: Equivalence between Thermodynamic Geometry and Optimal Transport
Authors:
Adrianne Zhong,
Michael R. DeWeese
Abstract:
A fundamental result of thermodynamic geometry is that the optimal, minimal-work protocol that drives a nonequilibrium system between two thermodynamic states in the slow-driving limit is given by a geodesic of the friction tensor, a Riemannian metric defined on control space. For overdamped dynamics in arbitrary dimensions, we demonstrate that thermodynamic geometry is equivalent to $L^2$ optimal…
▽ More
A fundamental result of thermodynamic geometry is that the optimal, minimal-work protocol that drives a nonequilibrium system between two thermodynamic states in the slow-driving limit is given by a geodesic of the friction tensor, a Riemannian metric defined on control space. For overdamped dynamics in arbitrary dimensions, we demonstrate that thermodynamic geometry is equivalent to $L^2$ optimal transport geometry defined on the space of equilibrium distributions corresponding to the control parameters. We show that obtaining optimal protocols past the slow-driving or linear response regime is computationally tractable as the sum of a friction tensor geodesic and a counterdiabatic term related to the Fisher information metric. These geodesic-counterdiabatic optimal protocols are exact for parameteric harmonic potentials, reproduce the surprising non-monotonic behavior recently discovered in linearly-biased double well optimal protocols, and explain the ubiquitous discontinuous jumps observed at the beginning and end times.
△ Less
Submitted 25 April, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
Time-Asymmetric Fluctuation Theorem and Efficient Free Energy Estimation
Authors:
Adrianne Zhong,
Benjamin Kuznets-Speck,
Michael R. DeWeese
Abstract:
The free-energy difference $ΔF$ between two high-dimensional systems is notoriously difficult to compute, but very important for many applications, such as drug discovery. We demonstrate that an unconventional definition of work introduced by Vaikuntanathan and Jarzynski (2008) satisfies a microscopic fluctuation theorem that relates path ensembles that are driven by protocols unequal under time-r…
▽ More
The free-energy difference $ΔF$ between two high-dimensional systems is notoriously difficult to compute, but very important for many applications, such as drug discovery. We demonstrate that an unconventional definition of work introduced by Vaikuntanathan and Jarzynski (2008) satisfies a microscopic fluctuation theorem that relates path ensembles that are driven by protocols unequal under time-reversal. It has been shown before that counterdiabatic protocols -- those having additional forcing that enforces the system to remain in instantaneous equilibrium, also known as escorted dynamics or engineered swift equilibration -- yield zero-variance work measurements for this definition. We show that this time-asymmetric microscopic fluctuation theorem can be exploited for efficient free energy estimation by developing a simple (i.e., neural-network free) and efficient adaptive time-asymmetric protocol optimization algorithm that yields $ΔF$ estimates that are orders of magnitude lower in mean squared error than the generic linear interpolation protocol with which it is initialized.
△ Less
Submitted 15 December, 2023; v1 submitted 24 April, 2023;
originally announced April 2023.
-
Shortcut engineering of active matter: run-and-tumble particles
Authors:
Adam G. Frim,
Michael R. DeWeese
Abstract:
Shortcut engineering consists of a class of approaches to rapidly manipulate physical systems by means of specially designed external controls. In this Letter, we apply these approaches to run-and-tumble particles, which are designed to mimic the chemotactic behavior of bacteria and therefore exhibit complex dynamics due to their self-propulsion and random reorientation, making them difficult to c…
▽ More
Shortcut engineering consists of a class of approaches to rapidly manipulate physical systems by means of specially designed external controls. In this Letter, we apply these approaches to run-and-tumble particles, which are designed to mimic the chemotactic behavior of bacteria and therefore exhibit complex dynamics due to their self-propulsion and random reorientation, making them difficult to control. Following a recent successful application to active Brownian particles, we find a general solution for the rapid control of 1D run-and-tumble particles in a harmonic potential. We demonstrate the effectiveness of our approach using numerical simulations and show that it can lead to a significant speedup compared to simple quenched protocols. Our results extend shortcut engineering to a wider class of active systems and demonstrate that it is a promising tool for controlling the dynamics of active matter, which has implications for a wide range of applications in fields such as materials science and biophysics.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Limited-control optimal protocols arbitrarily far from equilibrium
Authors:
Adrianne Zhong,
Michael R. DeWeese
Abstract:
Recent studies have explored finite-time dissipation-minimizing protocols for stochastic thermodynamic systems driven arbitrarily far from equilibrium, when granted full external control to drive the system. However, in both simulation and experimental contexts, systems often may only be controlled with a limited set of degrees of freedom. Here, going beyond slow- and fast-driving approximations e…
▽ More
Recent studies have explored finite-time dissipation-minimizing protocols for stochastic thermodynamic systems driven arbitrarily far from equilibrium, when granted full external control to drive the system. However, in both simulation and experimental contexts, systems often may only be controlled with a limited set of degrees of freedom. Here, going beyond slow- and fast-driving approximations employed in previous studies, we obtain exact finite-time optimal protocols for this unexplored limited-control setting. By working with deterministic Fokker-Planck probability density time evolution, we can frame the work-minimizing protocol problem in the standard form of an optimal control theory problem. We demonstrate that finding the exact optimal protocol is equivalent to solving a system of Hamiltonian partial differential equations, which in many cases admit efficiently calculatable numerical solutions. Within this framework, we reproduce analytical results for the optimal control of harmonic potentials, and numerically devise novel optimal protocols for two anharmonic examples: varying the stiffness of a quartic potential, and linearly biasing a double-well potential. We confirm that these optimal protocols outperform other protocols produced through previous methods, in some cases by a substantial amount. We find that for the linearly biased double-well problem, the mean position under the optimal protocol travels at a near-constant velocity. Surprisingly, for a certain timescale and barrier height regime, the optimal protocol is also non-monotonic in time.
△ Less
Submitted 8 July, 2022; v1 submitted 17 May, 2022;
originally announced May 2022.
-
A geometric bound on the efficiency of irreversible thermodynamic cycles
Authors:
Adam G. Frim,
Michael R. DeWeese
Abstract:
Stochastic thermodynamics has revolutionized our understanding of heat engines operating in finite time. Recently, numerous studies have considered the optimal operation of thermodynamic cycles acting as heat engines with a given profile in thermodynamic space (e.g. $P-V$ space in classical thermodynamics), with a particular focus on the Carnot engine. In this work, we use the lens of thermodynami…
▽ More
Stochastic thermodynamics has revolutionized our understanding of heat engines operating in finite time. Recently, numerous studies have considered the optimal operation of thermodynamic cycles acting as heat engines with a given profile in thermodynamic space (e.g. $P-V$ space in classical thermodynamics), with a particular focus on the Carnot engine. In this work, we use the lens of thermodynamic geometry to explore the full space of thermodynamic cycles with continuously-varying bath temperature in search of optimally shaped cycles acting in the slow-driving regime. We apply classical isoperimetric inequalities to derive a universal geometric bound on the efficiency of any irreversible thermodynamic cycle and explicitly construct efficient heat engines operating in finite time that nearly saturate this bound for a specific model system. Given the bound, these optimal cycles perform more efficiently than all other thermodynamic cycles operating as heat engines in finite time, including notable cycles, such as those of Carnot, Stirling, and Otto. For example, in comparison to recent experiments, this corresponds to orders of magnitude improvement in the efficiency of engines operating in certain time regimes. Our results suggest novel design principles for future mesoscopic heat engines and are ripe for experimental investigation.
△ Less
Submitted 14 June, 2022; v1 submitted 20 December, 2021;
originally announced December 2021.
-
Stochastic optimization for learning quantum state feedback control
Authors:
Ethan N. Evans,
Ziyi Wang,
Adam G. Frim,
Michael R. DeWeese,
Evangelos A. Theodorou
Abstract:
High fidelity state preparation represents a fundamental challenge in the application of quantum technology. While the majority of optimal control approaches use feedback to improve the controller, the controller itself often does not incorporate explicit state dependence. Here, we present a general framework for training deep feedback networks for open quantum systems with quantum nondemolition m…
▽ More
High fidelity state preparation represents a fundamental challenge in the application of quantum technology. While the majority of optimal control approaches use feedback to improve the controller, the controller itself often does not incorporate explicit state dependence. Here, we present a general framework for training deep feedback networks for open quantum systems with quantum nondemolition measurement that allows a variety of system and control structures that are prohibitive by many other techniques and can in effect react to unmodeled effects through nonlinear filtering. We demonstrate that this method is efficient due to inherent parallelizability, robust to open system interactions, and outperforms landmark state feedback control results in simulation.
△ Less
Submitted 18 November, 2021;
originally announced November 2021.
-
The Eigenlearning Framework: A Conservation Law Perspective on Kernel Regression and Wide Neural Networks
Authors:
James B. Simon,
Madeline Dickens,
Dhruva Karkada,
Michael R. DeWeese
Abstract:
We derive simple closed-form estimates for the test risk and other generalization metrics of kernel ridge regression (KRR). Relative to prior work, our derivations are greatly simplified and our final expressions are more readily interpreted. These improvements are enabled by our identification of a sharp conservation law which limits the ability of KRR to learn any orthonormal basis of functions.…
▽ More
We derive simple closed-form estimates for the test risk and other generalization metrics of kernel ridge regression (KRR). Relative to prior work, our derivations are greatly simplified and our final expressions are more readily interpreted. These improvements are enabled by our identification of a sharp conservation law which limits the ability of KRR to learn any orthonormal basis of functions. Test risk and other objects of interest are expressed transparently in terms of our conserved quantity evaluated in the kernel eigenbasis. We use our improved framework to: i) provide a theoretical explanation for the "deep bootstrap" of Nakkiran et al (2020), ii) generalize a previous result regarding the hardness of the classic parity problem, iii) fashion a theoretical tool for the study of adversarial robustness, and iv) draw a tight analogy between KRR and a well-studied system in statistical physics.
△ Less
Submitted 26 October, 2023; v1 submitted 8 October, 2021;
originally announced October 2021.
-
Optimal finite-time Brownian Carnot engine
Authors:
Adam G. Frim,
Michael R. DeWeese
Abstract:
Recent advances in experimental control of colloidal systems have spurred a revolution in the production of mesoscale thermodynamic devices. Functional "textbook" engines, such as the Stirling and Carnot cycles, have been produced in colloidal systems where they operate far from equilibrium. Simultaneously, significant theoretical advances have been made in the design and analysis of such devices.…
▽ More
Recent advances in experimental control of colloidal systems have spurred a revolution in the production of mesoscale thermodynamic devices. Functional "textbook" engines, such as the Stirling and Carnot cycles, have been produced in colloidal systems where they operate far from equilibrium. Simultaneously, significant theoretical advances have been made in the design and analysis of such devices. Here, we use methods from thermodynamic geometry to characterize the optimal finite-time, nonequilibrium cyclic operation of the parametric harmonic oscillator contact with a time-varying heat bath, with particular focus on the Brownian Carnot cycle. We derive the optimally parametrized Carnot cycle, along with two other new cycles and compare their dissipated energy, efficiency, and steady-state power production against each other and a previously tested experimental protocol for the Carnot cycle. We demonstrate a 20\% improvement in dissipated energy over previous experimentally tested protocols and a $\sim$50\% improvement under other conditions for one of our engines, while our final engine is more efficient and powerful than the others we considered. Our results provide the means for experimentally realizing optimal mesoscale heat engines.
△ Less
Submitted 14 June, 2022; v1 submitted 12 July, 2021;
originally announced July 2021.
-
Reverse Engineering the Neural Tangent Kernel
Authors:
James B. Simon,
Sajant Anand,
Michael R. DeWeese
Abstract:
The development of methods to guide the design of neural networks is an important open challenge for deep learning theory. As a paradigm for principled neural architecture design, we propose the translation of high-performing kernels, which are better-understood and amenable to first-principles design, into equivalent network architectures, which have superior efficiency, flexibility, and feature…
▽ More
The development of methods to guide the design of neural networks is an important open challenge for deep learning theory. As a paradigm for principled neural architecture design, we propose the translation of high-performing kernels, which are better-understood and amenable to first-principles design, into equivalent network architectures, which have superior efficiency, flexibility, and feature learning. To this end, we constructively prove that, with just an appropriate choice of activation function, any positive-semidefinite dot-product kernel can be realized as either the NNGP or neural tangent kernel of a fully-connected neural network with only one hidden layer. We verify our construction numerically and demonstrate its utility as a design tool for finite fully-connected networks in several experiments.
△ Less
Submitted 13 August, 2022; v1 submitted 6 June, 2021;
originally announced June 2021.
-
Engineered swift equilibration for arbitrary geometries
Authors:
Adam G. Frim,
Adrianne Zhong,
Shi-Fan Chen,
Dibyendu Mandal,
Michael R. DeWeese
Abstract:
Engineered swift equilibration (ESE) is a class of driving protocols that enforce an equilibrium distribution with respect to external control parameters at the beginning and end of rapid state transformations of open, classical non-equilibrium systems. ESE protocols have previously been derived and experimentally realized for Brownian particles in simple, one-dimensional, time-varying trapping po…
▽ More
Engineered swift equilibration (ESE) is a class of driving protocols that enforce an equilibrium distribution with respect to external control parameters at the beginning and end of rapid state transformations of open, classical non-equilibrium systems. ESE protocols have previously been derived and experimentally realized for Brownian particles in simple, one-dimensional, time-varying trapping potentials; one recent study considered ESE in two-dimensional Euclidean configuration space. Here we extend the ESE framework to generic, overdamped Brownian systems in arbitrary curved configuration space and illustrate our results with specific examples not amenable to previous techniques. Our approach may be used to impose the necessary dynamics to control the full temporal configurational distribution in a wide variety of experimentally realizable settings.
△ Less
Submitted 1 February, 2022; v1 submitted 15 December, 2020;
originally announced December 2020.
-
Solution to the Fokker-Planck equation for slowly driven Brownian motion: Emergent geometry and a formula for the corresponding thermodynamic metric
Authors:
Neha S. Wadia,
Ryan V. Zarcone,
Michael R. DeWeese
Abstract:
Considerable progress has recently been made with geometrical approaches to understanding and controlling small out-of-equilibrium systems, but a mathematically rigorous foundation for these methods has been lacking. Towards this end, we develop a perturbative solution to the Fokker-Planck equation for one-dimensional driven Brownian motion in the overdamped limit enabled by the spectral propertie…
▽ More
Considerable progress has recently been made with geometrical approaches to understanding and controlling small out-of-equilibrium systems, but a mathematically rigorous foundation for these methods has been lacking. Towards this end, we develop a perturbative solution to the Fokker-Planck equation for one-dimensional driven Brownian motion in the overdamped limit enabled by the spectral properties of the corresponding single-particle Schrödinger operator. The perturbation theory is in powers of the inverse characteristic timescale of variation of the fastest varying control parameter, measured in units of the system timescale, which is set by the smallest eigenvalue of the corresponding Schrödinger operator. It applies to any Brownian system for which the Schrödinger operator has a confining potential. We use the theory to rigorously derive an exact formula for a Riemannian "thermodynamic" metric in the space of control parameters of the system. We show that up to second-order terms in the perturbation theory, optimal dissipation-minimizing driving protocols minimize the length defined by this metric. We also show that a previously proposed metric is calculable from our exact formula with corrections that are exponentially suppressed in a characteristic length scale. We illustrate our formula using the two-dimensional example of a harmonic oscillator with time-dependent spring constant in a time-dependent electric field. Lastly, we demonstrate that the Riemannian geometric structure of the optimal control problem is emergent; it derives from the form of the perturbative expansion for the probability density and persists to all orders of the expansion.
△ Less
Submitted 5 April, 2022; v1 submitted 31 July, 2020;
originally announced August 2020.
-
A new method for parameter estimation in probabilistic models: Minimum probability flow
Authors:
Jascha Sohl-Dickstein,
Peter Battaglino,
Michael R. DeWeese
Abstract:
Fitting probabilistic models to data is often difficult, due to the general intractability of the partition function. We propose a new parameter fitting method, Minimum Probability Flow (MPF), which is applicable to any parametric model. We demonstrate parameter estimation using MPF in two cases: a continuous state space model, and an Ising spin glass. In the latter case it outperforms current tec…
▽ More
Fitting probabilistic models to data is often difficult, due to the general intractability of the partition function. We propose a new parameter fitting method, Minimum Probability Flow (MPF), which is applicable to any parametric model. We demonstrate parameter estimation using MPF in two cases: a continuous state space model, and an Ising spin glass. In the latter case it outperforms current techniques by at least an order of magnitude in convergence time with lower error in the recovered coupling parameters.
△ Less
Submitted 17 July, 2020;
originally announced July 2020.
-
Critical Point-Finding Methods Reveal Gradient-Flat Regions of Deep Network Losses
Authors:
Charles G. Frye,
James Simon,
Neha S. Wadia,
Andrew Ligeralde,
Michael R. DeWeese,
Kristofer E. Bouchard
Abstract:
Despite the fact that the loss functions of deep neural networks are highly non-convex, gradient-based optimization algorithms converge to approximately the same performance from many random initial points. One thread of work has focused on explaining this phenomenon by characterizing the local curvature near critical points of the loss function, where the gradients are near zero, and demonstratin…
▽ More
Despite the fact that the loss functions of deep neural networks are highly non-convex, gradient-based optimization algorithms converge to approximately the same performance from many random initial points. One thread of work has focused on explaining this phenomenon by characterizing the local curvature near critical points of the loss function, where the gradients are near zero, and demonstrating that neural network losses enjoy a no-bad-local-minima property and an abundance of saddle points. We report here that the methods used to find these putative critical points suffer from a bad local minima problem of their own: they often converge to or pass through regions where the gradient norm has a stationary point. We call these gradient-flat regions, since they arise when the gradient is approximately in the kernel of the Hessian, such that the loss is locally approximately linear, or flat, in the direction of the gradient. We describe how the presence of these regions necessitates care in both interpreting past results that claimed to find critical points of neural network losses and in designing second-order methods for optimizing neural networks.
△ Less
Submitted 23 March, 2020;
originally announced March 2020.
-
Design of optical neural networks with component imprecisions
Authors:
Michael Y. -S. Fang,
Sasikanth Manipatruni,
Casimir Wierzynski,
Amir Khosrowshahi,
Michael R. DeWeese
Abstract:
For the benefit of designing scalable, fault resistant optical neural networks (ONNs), we investigate the effects architectural designs have on the ONNs' robustness to imprecise components. We train two ONNs -- one with a more tunable design (GridNet) and one with better fault tolerance (FFTNet) -- to classify handwritten digits. When simulated without any imperfections, GridNet yields a better ac…
▽ More
For the benefit of designing scalable, fault resistant optical neural networks (ONNs), we investigate the effects architectural designs have on the ONNs' robustness to imprecise components. We train two ONNs -- one with a more tunable design (GridNet) and one with better fault tolerance (FFTNet) -- to classify handwritten digits. When simulated without any imperfections, GridNet yields a better accuracy (~98%) than FFTNet (~95%). However, under a small amount of error in their photonic components, the more fault tolerant FFTNet overtakes GridNet. We further provide thorough quantitative and qualitative analyses of ONNs' sensitivity to varying levels and types of imprecisions. Our results offer guidelines for the principled design of fault-tolerant ONNs as well as a foundation for further research.
△ Less
Submitted 13 December, 2019;
originally announced January 2020.
-
Thermodynamic Computing
Authors:
Tom Conte,
Erik DeBenedictis,
Natesh Ganesh,
Todd Hylton,
John Paul Strachan,
R. Stanley Williams,
Alexander Alemi,
Lee Altenberg,
Gavin Crooks,
James Crutchfield,
Lidia del Rio,
Josh Deutsch,
Michael DeWeese,
Khari Douglas,
Massimiliano Esposito,
Michael Frank,
Robert Fry,
Peter Harsha,
Mark Hill,
Christopher Kello,
Jeff Krichmar,
Suhas Kumar,
Shih-Chii Liu,
Seth Lloyd,
Matteo Marsili
, et al. (14 additional authors not shown)
Abstract:
The hardware and software foundations laid in the first half of the 20th Century enabled the computing technologies that have transformed the world, but these foundations are now under siege. The current computing paradigm, which is the foundation of much of the current standards of living that we now enjoy, faces fundamental limitations that are evident from several perspectives. In terms of hard…
▽ More
The hardware and software foundations laid in the first half of the 20th Century enabled the computing technologies that have transformed the world, but these foundations are now under siege. The current computing paradigm, which is the foundation of much of the current standards of living that we now enjoy, faces fundamental limitations that are evident from several perspectives. In terms of hardware, devices have become so small that we are struggling to eliminate the effects of thermodynamic fluctuations, which are unavoidable at the nanometer scale. In terms of software, our ability to imagine and program effective computational abstractions and implementations are clearly challenged in complex domains. In terms of systems, currently five percent of the power generated in the US is used to run computing systems - this astonishing figure is neither ecologically sustainable nor economically scalable. Economically, the cost of building next-generation semiconductor fabrication plants has soared past $10 billion. All of these difficulties - device scaling, software complexity, adaptability, energy consumption, and fabrication economics - indicate that the current computing paradigm has matured and that continued improvements along this path will be limited. If technological progress is to continue and corresponding social and economic benefits are to continue to accrue, computing must become much more capable, energy efficient, and affordable. We propose that progress in computing can continue under a united, physically grounded, computational paradigm centered on thermodynamics. Herein we propose a research agenda to extend these thermodynamic foundations into complex, non-equilibrium, self-organizing systems and apply them holistically to future computing systems that will harness nature's innate computational capacity. We call this type of computing "Thermodynamic Computing" or TC.
△ Less
Submitted 14 November, 2019; v1 submitted 5 November, 2019;
originally announced November 2019.
-
Numerically Recovering the Critical Points of a Deep Linear Autoencoder
Authors:
Charles G. Frye,
Neha S. Wadia,
Michael R. DeWeese,
Kristofer E. Bouchard
Abstract:
Numerically locating the critical points of non-convex surfaces is a long-standing problem central to many fields. Recently, the loss surfaces of deep neural networks have been explored to gain insight into outstanding questions in optimization, generalization, and network architecture design. However, the degree to which recently-proposed methods for numerically recovering critical points actuall…
▽ More
Numerically locating the critical points of non-convex surfaces is a long-standing problem central to many fields. Recently, the loss surfaces of deep neural networks have been explored to gain insight into outstanding questions in optimization, generalization, and network architecture design. However, the degree to which recently-proposed methods for numerically recovering critical points actually do so has not been thoroughly evaluated. In this paper, we examine this issue in a case for which the ground truth is known: the deep linear autoencoder. We investigate two sub-problems associated with numerical critical point identification: first, because of large parameter counts, it is infeasible to find all of the critical points for contemporary neural networks, necessitating sampling approaches whose characteristics are poorly understood; second, the numerical tolerance for accurately identifying a critical point is unknown, and conservative tolerances are difficult to satisfy. We first identify connections between recently-proposed methods and well-understood methods in other fields, including chemical physics, economics, and algebraic geometry. We find that several methods work well at recovering certain information about loss surfaces, but fail to take an unbiased sample of critical points. Furthermore, numerical tolerance must be very strict to ensure that numerically-identified critical points have similar properties to true analytical critical points. We also identify a recently-published Newton method for optimization that outperforms previous methods as a critical point-finding algorithm. We expect our results will guide future attempts to numerically study critical points in large nonlinear neural networks.
△ Less
Submitted 29 January, 2019;
originally announced January 2019.
-
Reply to Comment on`Entropy production and fluctuation theorems for active matter'
Authors:
Dibyendu Mandal,
Katherine Klymko,
Michael R. DeWeese
Abstract:
This is a reply to the comment to a letter by D. Mandal, K. Klymko and M. R. DeWeese published as Phys. Rev. Lett. 119, 258001 (2017).
This is a reply to the comment to a letter by D. Mandal, K. Klymko and M. R. DeWeese published as Phys. Rev. Lett. 119, 258001 (2017).
△ Less
Submitted 10 September, 2018;
originally announced September 2018.
-
Entropy production and fluctuation theorems for active matter
Authors:
Dibyendu Mandal,
Katherine Klymko,
Michael R. DeWeese
Abstract:
Active biological systems reside far from equilibrium, dissipating heat even in their steady state, thus requiring an extension of conventional equilibrium thermodynamics and statistical mechanics. In this Letter, we have extended the emerging framework of stochastic thermodynamics to active matter. In particular, for the active Ornstein-Uhlenbeck model, we have provided consistent definitions of…
▽ More
Active biological systems reside far from equilibrium, dissipating heat even in their steady state, thus requiring an extension of conventional equilibrium thermodynamics and statistical mechanics. In this Letter, we have extended the emerging framework of stochastic thermodynamics to active matter. In particular, for the active Ornstein-Uhlenbeck model, we have provided consistent definitions of thermodynamic quantities such as work, energy, heat, entropy, and entropy production at the level of single, stochastic trajectories and derived related fluctuation relations. We have developed a generalization of the Clausius inequality, which is valid even in the presence of the non-Hamiltonian dynamics underlying active matter systems. We have illustrated our results with explicit numerical studies.
△ Less
Submitted 7 April, 2017;
originally announced April 2017.
-
Nonequilibrium work energy relation for non-Hamiltonian dynamics
Authors:
Dibyendu Mandal,
Michael R. DeWeese
Abstract:
Recent years have witnessed major advances in our understanding of nonequilibrium processes. The Jarzynski equality, for example, provides a link between equilibrium free energy differences and finite-time, nonequilibrium dynamics. We propose a generalization of this relation to non-Hamiltonian dynamics, relevant for active matter systems, continuous feedback, and computer simulation. Surprisingly…
▽ More
Recent years have witnessed major advances in our understanding of nonequilibrium processes. The Jarzynski equality, for example, provides a link between equilibrium free energy differences and finite-time, nonequilibrium dynamics. We propose a generalization of this relation to non-Hamiltonian dynamics, relevant for active matter systems, continuous feedback, and computer simulation. Surprisingly, this relation allows us to calculate the free energy difference between the desired initial and final equilibrium states using arbitrary dynamics. As a practical matter, this dissociation between the dynamics and the initial and final states promises to facilitate a range of techniques for free energy estimation in a single, universal expression.
△ Less
Submitted 25 April, 2016; v1 submitted 17 September, 2015;
originally announced September 2015.
-
A Markov Jump Process for More Efficient Hamiltonian Monte Carlo
Authors:
Andrew B. Berger,
Mayur Mudigonda,
Michael R. DeWeese,
Jascha Sohl-Dickstein
Abstract:
In most sampling algorithms, including Hamiltonian Monte Carlo, transition rates between states correspond to the probability of making a transition in a single time step, and are constrained to be less than or equal to 1. We derive a Hamiltonian Monte Carlo algorithm using a continuous time Markov jump process, and are thus able to escape this constraint. Transition rates in a Markov jump process…
▽ More
In most sampling algorithms, including Hamiltonian Monte Carlo, transition rates between states correspond to the probability of making a transition in a single time step, and are constrained to be less than or equal to 1. We derive a Hamiltonian Monte Carlo algorithm using a continuous time Markov jump process, and are thus able to escape this constraint. Transition rates in a Markov jump process need only be non-negative. We demonstrate that the new algorithm leads to improved mixing for several example problems, both by evaluating the spectral gap of the Markov operator, and by computing autocorrelation as a function of compute time. We release the algorithm as an open source Python package.
△ Less
Submitted 11 October, 2015; v1 submitted 13 September, 2015;
originally announced September 2015.
-
Optimal protocols for slowly-driven quantum processes
Authors:
Patrick R. Zulkowski,
Michael R. DeWeese
Abstract:
The design of efficient quantum information processing will rely on optimal nonequilibrium transitions of driven quantum systems. Building on a recently-developed geometric framework for computing optimal protocols for classical systems driven in finite-time, we construct a general framework for optimizing the average information entropy for driven quantum systems. Geodesics on the parameter manif…
▽ More
The design of efficient quantum information processing will rely on optimal nonequilibrium transitions of driven quantum systems. Building on a recently-developed geometric framework for computing optimal protocols for classical systems driven in finite-time, we construct a general framework for optimizing the average information entropy for driven quantum systems. Geodesics on the parameter manifold endowed with a positive semi-definite metric correspond to protocols that minimize the average information entropy production in finite-time. We use this framework to explicitly compute the optimal entropy production for a simple two-state quantum system coupled to a heat bath of bosonic oscillators, which has applications to quantum annealing.
△ Less
Submitted 26 August, 2015; v1 submitted 11 June, 2015;
originally announced June 2015.
-
Optimal Control of Overdamped Systems
Authors:
Patrick R. Zulkowski,
Michael R. DeWeese
Abstract:
Nonequilibrium physics encompasses a broad range of natural and synthetic small-scale systems. Optimizing transitions of such systems will be crucial for the development of nanoscale technologies and may reveal the physical principles underlying biological processes at the molecular level. Recent work has demonstrated that when a thermodynamic system is driven away from equilibrium then the space…
▽ More
Nonequilibrium physics encompasses a broad range of natural and synthetic small-scale systems. Optimizing transitions of such systems will be crucial for the development of nanoscale technologies and may reveal the physical principles underlying biological processes at the molecular level. Recent work has demonstrated that when a thermodynamic system is driven away from equilibrium then the space of controllable parameters has a Riemannian geometry induced by a generalized inverse diffusion tensor. We derive a simple, compact expression for the inverse diffusion tensor that depends solely on equilibrium information for a broad class of potentials. We use this formula to compute the minimal dissipation for two model systems relevant to small-scale information processing and biological molecular motors. In the first model, we optimally erase a single classical bit of information modelled by an overdamped particle in a smooth double-well potential. In the second model, we find the minimal dissipation of a simple molecular motor model coupled to an optical trap. In both models, we find that the minimal dissipation for the optimal protocol is inversely proportional to protocol duration, as expected, though the dissipation for the erasure model takes a different form than what we found previously for a similar system.
△ Less
Submitted 31 August, 2015; v1 submitted 11 June, 2015;
originally announced June 2015.
-
Time Resolution Dependence of Information Measures for Spiking Neurons: Atoms, Scaling, and Universality
Authors:
Sarah E. Marzen,
Michael R. DeWeese,
James P. Crutchfield
Abstract:
The mutual information between stimulus and spike-train response is commonly used to monitor neural coding efficiency, but neuronal computation broadly conceived requires more refined and targeted information measures of input-output joint processes. A first step towards that larger goal is to develop information measures for individual output processes, including information generation (entropy r…
▽ More
The mutual information between stimulus and spike-train response is commonly used to monitor neural coding efficiency, but neuronal computation broadly conceived requires more refined and targeted information measures of input-output joint processes. A first step towards that larger goal is to develop information measures for individual output processes, including information generation (entropy rate), stored information (statistical complexity), predictable information (excess entropy), and active information accumulation (bound information rate). We calculate these for spike trains generated by a variety of noise-driven integrate-and-fire neurons as a function of time resolution and for alternating renewal processes. We show that their time-resolution dependence reveals coarse-grained structural properties of interspike interval statistics; e.g., $τ$-entropy rates that diverge less quickly than the firing rate indicate interspike interval correlations. We also find evidence that the excess entropy and regularized statistical complexity of different types of integrate-and-fire neurons are universal in the continuous-time limit in the sense that they do not depend on mechanism details. This suggests a surprising simplicity in the spike trains generated by these model neurons. Interestingly, neurons with gamma-distributed ISIs and neurons whose spike trains are alternating renewal processes do not fall into the same universality class. These results lead to two conclusions. First, the dependence of information measures on time resolution reveals mechanistic details about spike train generation. Second, information measures can be used as model selection tools for analyzing spike train processes.
△ Less
Submitted 18 April, 2015;
originally announced April 2015.
-
Hamiltonian Monte Carlo Without Detailed Balance
Authors:
Jascha Sohl-Dickstein,
Mayur Mudigonda,
Michael R. DeWeese
Abstract:
We present a method for performing Hamiltonian Monte Carlo that largely eliminates sample rejection for typical hyperparameters. In situations that would normally lead to rejection, instead a longer trajectory is computed until a new state is reached that can be accepted. This is achieved using Markov chain transitions that satisfy the fixed point equation, but do not satisfy detailed balance. The…
▽ More
We present a method for performing Hamiltonian Monte Carlo that largely eliminates sample rejection for typical hyperparameters. In situations that would normally lead to rejection, instead a longer trajectory is computed until a new state is reached that can be accepted. This is achieved using Markov chain transitions that satisfy the fixed point equation, but do not satisfy detailed balance. The resulting algorithm significantly suppresses the random walk behavior and wasted function evaluations that are typically the consequence of update rejection. We demonstrate a greater than factor of two improvement in mixing time on three test problems. We release the source code as Python and MATLAB packages.
△ Less
Submitted 25 March, 2016; v1 submitted 18 September, 2014;
originally announced September 2014.
-
Optimal finite-time erasure of a classical bit
Authors:
Patrick R. Zulkowski,
Michael R. DeWeese
Abstract:
Information erasure inevitably leads to heat dissipation. Minimizing this dissipation will be crucial for developing small-scale information processing systems, but little is known about the optimal procedures required. We have obtained closed-form expressions for maximally efficient erasure cycles for deletion of a classical bit of information stored by the position of a particle diffusing in a d…
▽ More
Information erasure inevitably leads to heat dissipation. Minimizing this dissipation will be crucial for developing small-scale information processing systems, but little is known about the optimal procedures required. We have obtained closed-form expressions for maximally efficient erasure cycles for deletion of a classical bit of information stored by the position of a particle diffusing in a double-well potential. We find that the extra dissipation beyond the Landauer bound is proportional to the square of the Hellinger distance between the initial and final states divided by the cycle duration, which quantifies how far out of equilibrium the system is driven. Finally, we demonstrate close agreement between the exact optimal cycle and the protocol found using a linear response framework.
△ Less
Submitted 16 October, 2013; v1 submitted 15 October, 2013;
originally announced October 2013.
-
Optimal control of transitions between nonequilibrium steady states
Authors:
Patrick R. Zulkowski,
David A. Sivak,
Michael R. DeWeese
Abstract:
Biological systems fundamentally exist out of equilibrium in order to preserve organized structures and processes. Many changing cellular conditions can be represented as transitions between nonequilibrium steady states, and organisms have an interest in optimizing such transitions. Using the Hatano-Sasa Y-value, we extend a recently developed geometrical framework for determining optimal protocol…
▽ More
Biological systems fundamentally exist out of equilibrium in order to preserve organized structures and processes. Many changing cellular conditions can be represented as transitions between nonequilibrium steady states, and organisms have an interest in optimizing such transitions. Using the Hatano-Sasa Y-value, we extend a recently developed geometrical framework for determining optimal protocols so that it can be applied to systems driven from nonequilibrium steady states. We calculate and numerically verify optimal protocols for a colloidal particle dragged through solution by a translating optical trap with two controllable parameters. We offer experimental predictions, specifically that optimal protocols are significantly less costly than naive ones. Optimal protocols similar to these may ultimately point to design principles for biological energy transduction systems and guide the design of artificial molecular machines.
△ Less
Submitted 29 October, 2013; v1 submitted 26 March, 2013;
originally announced March 2013.
-
Sparse Codes for Speech Predict Spectrotemporal Receptive Fields in the Inferior Colliculus
Authors:
Nicole L. Carlson,
Vivienne L. Ming,
Michael R. DeWeese
Abstract:
We have developed a sparse mathematical representation of speech that minimizes the number of active model neurons needed to represent typical speech sounds. The model learns several well-known acoustic features of speech such as harmonic stacks, formants, onsets and terminations, but we also find more exotic structures in the spectrogram representation of sound such as localized checkerboard patt…
▽ More
We have developed a sparse mathematical representation of speech that minimizes the number of active model neurons needed to represent typical speech sounds. The model learns several well-known acoustic features of speech such as harmonic stacks, formants, onsets and terminations, but we also find more exotic structures in the spectrogram representation of sound such as localized checkerboard patterns and frequency-modulated excitatory subregions flanked by suppressive sidebands. Moreover, several of these novel features resemble neuronal receptive fields reported in the Inferior Colliculus (IC), as well as auditory thalamus and cortex, and our model neurons exhibit the same tradeoff in spectrotemporal resolution as has been observed in IC. To our knowledge, this is the first demonstration that receptive fields of neurons in the ascending mammalian auditory pathway beyond the auditory nerve can be predicted based on coding principles and the statistical properties of recorded sounds.
△ Less
Submitted 22 September, 2012;
originally announced September 2012.
-
Minimum and maximum entropy distributions for binary systems with known means and pairwise correlations
Authors:
Badr F. Albanna,
Christopher Hillar,
Jascha Sohl-Dickstein,
Michael R. DeWeese
Abstract:
Maximum entropy models are increasingly being used to describe the collective activity of neural populations with measured mean neural activities and pairwise correlations, but the full space of probability distributions consistent with these constraints has not been explored. We provide upper and lower bounds on the entropy for the {\em minimum} entropy distribution over arbitrarily large collect…
▽ More
Maximum entropy models are increasingly being used to describe the collective activity of neural populations with measured mean neural activities and pairwise correlations, but the full space of probability distributions consistent with these constraints has not been explored. We provide upper and lower bounds on the entropy for the {\em minimum} entropy distribution over arbitrarily large collections of binary units with any fixed set of mean values and pairwise correlations. We also construct specific low-entropy distributions for several relevant cases. Surprisingly, the minimum entropy solution has entropy scaling logarithmically with system size for any set of first- and second-order statistics consistent with arbitrarily large systems. We further demonstrate that some sets of these low-order statistics can only be realized by small systems. Our results show how only small amounts of randomness are needed to mimic low-order statistical properties of highly entropic distributions, and we discuss some applications for engineered and biological information transmission systems.
△ Less
Submitted 21 August, 2017; v1 submitted 17 September, 2012;
originally announced September 2012.
-
Dead leaves and the dirty ground: low-level image statistics in transmissive and occlusive imaging environments
Authors:
Joel Zylberberg,
David Pfau,
Michael Robert DeWeese
Abstract:
The opacity of typical objects in the world results in occlusion --- an important property of natural scenes that makes inference of the full 3-dimensional structure of the world challenging. The relationship between occlusion and low-level image statistics has been hotly debated in the literature, and extensive simulations have been used to determine whether occlusion is responsible for the ubiqu…
▽ More
The opacity of typical objects in the world results in occlusion --- an important property of natural scenes that makes inference of the full 3-dimensional structure of the world challenging. The relationship between occlusion and low-level image statistics has been hotly debated in the literature, and extensive simulations have been used to determine whether occlusion is responsible for the ubiquitously observed power-law power spectra of natural images. To deepen our understanding of this problem, we have analytically computed the 2- and 4-point functions of a generalized "dead leaves" model of natural images with parameterized object transparency. Surprisingly, transparency alters these functions only by a multiplicative constant, so long as object diameters follow a power law distribution. For other object size distributions, transparency more substantially affects the low-level image statistics. We propose that the universality of power law power spectra for both natural scenes and radiological medical images -- formed by the transmission of x-rays through partially transparent tissue -- stems from power law object size distributions, independent of object opacity.
△ Less
Submitted 5 December, 2012; v1 submitted 14 September, 2012;
originally announced September 2012.
-
The geometry of thermodynamic control
Authors:
Patrick R. Zulkowski,
David A. Sivak,
Gavin E. Crooks,
Michael R. DeWeese
Abstract:
A deeper understanding of nonequilibrium phenomena is needed to reveal the principles governing natural and synthetic molecular machines. Recent work has shown that when a thermodynamic system is driven from equilibrium then, in the linear response regime, the space of controllable parameters has a Riemannian geometry induced by a generalized friction tensor. We exploit this geometric insight to c…
▽ More
A deeper understanding of nonequilibrium phenomena is needed to reveal the principles governing natural and synthetic molecular machines. Recent work has shown that when a thermodynamic system is driven from equilibrium then, in the linear response regime, the space of controllable parameters has a Riemannian geometry induced by a generalized friction tensor. We exploit this geometric insight to construct closed-form expressions for minimal-dissipation protocols for a particle diffusing in a one dimensional harmonic potential, where the spring constant, inverse temperature, and trap location are adjusted simultaneously. These optimal protocols are geodesics on the Riemannian manifold, and reveal that this simple model has a surprisingly rich geometry. We test these optimal protocols via a numerical implementation of the Fokker-Planck equation and demonstrate that the friction tensor arises naturally from a first order expansion in temporal derivatives of the control parameters, without appealing directly to linear response theory.
△ Less
Submitted 9 October, 2012; v1 submitted 22 August, 2012;
originally announced August 2012.
-
A sparse coding model with synaptically local plasticity and spiking neurons can account for the diverse shapes of V1 simple cell receptive fields
Authors:
Joel Zylberberg,
Jason Timothy Murphy,
Michael Robert DeWeese
Abstract:
Sparse coding algorithms trained on natural images can accurately predict the features that excite visual cortical neurons, but it is not known whether such codes can be learned using biologically realistic plasticity rules. We have developed a biophysically motivated spiking network, relying solely on synaptically local information, that can predict the full diversity of V1 simple cell receptive…
▽ More
Sparse coding algorithms trained on natural images can accurately predict the features that excite visual cortical neurons, but it is not known whether such codes can be learned using biologically realistic plasticity rules. We have developed a biophysically motivated spiking network, relying solely on synaptically local information, that can predict the full diversity of V1 simple cell receptive field shapes when trained on natural images. This represents the first demonstration that sparse coding principles, operating within the constraints imposed by cortical architecture, can successfully reproduce these receptive fields. We further prove, mathematically, that sparseness and decorrelation are the key ingredients that allow for synaptically local plasticity rules to optimize a cooperative, linear generative image model formed by the neural representation. Finally, we discuss several interesting emergent properties of our network, with the intent of bridging the gap between theoretical and experimental studies of visual cortex.
△ Less
Submitted 10 September, 2011;
originally announced September 2011.
-
How shoud prey animals respond to uncertain threats?
Authors:
Joel Zylberberg,
Michael R. DeWeese
Abstract:
A prey animal surveying its environment must decide whether there is a dangerous predator present or not. If there is, it may flee. Flight has an associated cost, so the animal should not flee if there is no danger. However, the prey animal cannot know the state of its environment with certainty, and is thus bound to make some errors. We formulate a probabilistic automaton model of a prey animal's…
▽ More
A prey animal surveying its environment must decide whether there is a dangerous predator present or not. If there is, it may flee. Flight has an associated cost, so the animal should not flee if there is no danger. However, the prey animal cannot know the state of its environment with certainty, and is thus bound to make some errors. We formulate a probabilistic automaton model of a prey animal's life and use it to compute the optimal escape decision strategy, subject to the animal's uncertainty. The uncertainty is a major factor in determining the decision strategy: only in the presence of uncertainty do economic factors (like mating opportunities lost due to flight) influence the decision. We performed computer simulations and found that \emph{in silico} populations of animals subject to predation evolve to display the strategies predicted by our model, confirming our choice of objective function for our analytic calculations. To the best of our knowledge, this is the first theoretical study of escape decisions to incorporate the effects of uncertainty, and to demonstrate the correctness of the objective function used in the model.
△ Less
Submitted 19 April, 2011;
originally announced April 2011.
-
Minimum Probability Flow Learning
Authors:
Jascha Sohl-Dickstein,
Peter Battaglino,
Michael R. DeWeese
Abstract:
Fitting probabilistic models to data is often difficult, due to the general intractability of the partition function and its derivatives. Here we propose a new parameter estimation technique that does not require computing an intractable normalization factor or sampling from the equilibrium distribution of the model. This is achieved by establishing dynamics that would transform the observed data…
▽ More
Fitting probabilistic models to data is often difficult, due to the general intractability of the partition function and its derivatives. Here we propose a new parameter estimation technique that does not require computing an intractable normalization factor or sampling from the equilibrium distribution of the model. This is achieved by establishing dynamics that would transform the observed data distribution into the model distribution, and then setting as the objective the minimization of the KL divergence between the data distribution and the distribution produced by running the dynamics for an infinitesimal time. Score matching, minimum velocity learning, and certain forms of contrastive divergence are shown to be special cases of this learning technique. We demonstrate parameter estimation in Ising models, deep belief networks and an independent component analysis model of natural scenes. In the Ising model case, current state of the art techniques are outperformed by at least an order of magnitude in learning time, with lower error in recovered coupling parameters.
△ Less
Submitted 24 September, 2011; v1 submitted 25 June, 2009;
originally announced June 2009.