-
Grokking vs. Learning: Same Features, Different Encodings
Authors:
Dmitry Manning-Coe,
Jacopo Gliozzi,
Alexander G. Stapleton,
Edward Hirst,
Giuseppe De Tomasi,
Barry Bradlyn,
David S. Berman
Abstract:
Grokking typically achieves similar loss to ordinary, "steady", learning. We ask whether these different learning paths - grokking versus ordinary training - lead to fundamental differences in the learned models. To do so we compare the features, compressibility, and learning dynamics of models trained via each path in two tasks. We find that grokked and steadily trained models learn the same feat…
▽ More
Grokking typically achieves similar loss to ordinary, "steady", learning. We ask whether these different learning paths - grokking versus ordinary training - lead to fundamental differences in the learned models. To do so we compare the features, compressibility, and learning dynamics of models trained via each path in two tasks. We find that grokked and steadily trained models learn the same features, but there can be large differences in the efficiency with which these features are encoded. In particular, we find a novel "compressive regime" of steady training in which there emerges a linear trade-off between model loss and compressibility, and which is absent in grokking. In this regime, we can achieve compression factors 25x times the base model, and 5x times the compression achieved in grokking. We then track how model features and compressibility develop through training. We show that model development in grokking is task-dependent, and that peak compressibility is achieved immediately after the grokking plateau. Finally, novel information-geometric measures are introduced which demonstrate that models undergoing grokking follow a straight path in information space.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
Three-dimensional nucleation and growth of deformation twins in magnesium
Authors:
Sangwon Lee,
Michael Pilipchuk,
Can Yildirim,
Duncan Greeley,
Qianying Shi,
Tracy D. Berman,
Adam Creuziger,
Evan Rust,
Carsten Detlefs,
Veera Sundararaghavan,
John E. Allison,
Ashley Bucsek
Abstract:
At two-thirds the weight of aluminum, magnesium alloys have the potential to significantly reduce the fuel consumption of transportation vehicles. These advancements depend on our ability to optimize the desirable versus undesirable effects of deformation twins: three dimensional (3D) microstructural domains that form under mechanical stresses. Previously only characterized using surface or thin-f…
▽ More
At two-thirds the weight of aluminum, magnesium alloys have the potential to significantly reduce the fuel consumption of transportation vehicles. These advancements depend on our ability to optimize the desirable versus undesirable effects of deformation twins: three dimensional (3D) microstructural domains that form under mechanical stresses. Previously only characterized using surface or thin-film measurements, here, we present the first 3D in-situ characterization of deformation twinning inside an embedded grain over mesoscopic fields of view using dark-field X-ray microscopy supported by crystal plasticity finite element analysis. The results reveal the important role of triple junctions on twin nucleation, that twin growth behavior is irregular and can occur in several directions simultaneously, and that twin-grain and twin-twin junctions are the sites of localized dislocation accumulation, a necessary precursor to crack initiation.
△ Less
Submitted 21 December, 2024;
originally announced December 2024.
-
NCoder -- A Quantum Field Theory approach to encoding data
Authors:
David S. Berman,
Marc S. Klinger,
Alexander G. Stapleton
Abstract:
In this paper we present a novel approach to interpretable AI inspired by Quantum Field Theory (QFT) which we call the NCoder. The NCoder is a modified autoencoder neural network whose latent layer is prescribed to be a subset of $n$-point correlation functions. Regarding images as draws from a lattice field theory, this architecture mimics the task of perturbatively constructing the effective act…
▽ More
In this paper we present a novel approach to interpretable AI inspired by Quantum Field Theory (QFT) which we call the NCoder. The NCoder is a modified autoencoder neural network whose latent layer is prescribed to be a subset of $n$-point correlation functions. Regarding images as draws from a lattice field theory, this architecture mimics the task of perturbatively constructing the effective action of the theory order by order in an expansion using Feynman diagrams. Alternatively, the NCoder may be regarded as simulating the procedure of statistical inference whereby high dimensional data is first summarized in terms of several lower dimensional summary statistics (here the $n$-point correlation functions), and subsequent out-of-sample data is generated by inferring the data generating distribution from these statistics. In this way the NCoder suggests a fascinating correspondence between perturbative renormalizability and the sufficiency of models. We demonstrate the efficacy of the NCoder by applying it to the generation of MNIST images, and find that generated images can be correctly classified using only information from the first three $n$-point functions of the image distribution.
△ Less
Submitted 3 June, 2025; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Bayesian Renormalization
Authors:
David S. Berman,
Marc S. Klinger,
Alexander G. Stapleton
Abstract:
In this note we present a fully information theoretic approach to renormalization inspired by Bayesian statistical inference, which we refer to as Bayesian Renormalization. The main insight of Bayesian Renormalization is that the Fisher metric defines a correlation length that plays the role of an emergent RG scale quantifying the distinguishability between nearby points in the space of probabilit…
▽ More
In this note we present a fully information theoretic approach to renormalization inspired by Bayesian statistical inference, which we refer to as Bayesian Renormalization. The main insight of Bayesian Renormalization is that the Fisher metric defines a correlation length that plays the role of an emergent RG scale quantifying the distinguishability between nearby points in the space of probability distributions. This RG scale can be interpreted as a proxy for the maximum number of unique observations that can be made about a given system during a statistical inference experiment. The role of the Bayesian Renormalization scheme is subsequently to prepare an effective model for a given system up to a precision which is bounded by the aforementioned scale. In applications of Bayesian Renormalization to physical systems, the emergent information theoretic scale is naturally identified with the maximum energy that can be probed by current experimental apparatus, and thus Bayesian Renormalization coincides with ordinary renormalization. However, Bayesian Renormalization is sufficiently general to apply even in circumstances in which an immediate physical scale is absent, and thus provides an ideal approach to renormalization in data science contexts. To this end, we provide insight into how the Bayesian Renormalization scheme relates to existing methods for data compression and data generation such as the information bottleneck and the diffusion learning paradigm. We conclude by designing an explicit form of Bayesian Renormalization inspired by Wilson's momentum shell renormalization scheme in Quantum Field Theory. We apply this Bayesian Renormalization scheme to a simple Neural Network and verify the sense in which it organizes the parameters of the model according to a hierarchy of information theoretic importance.
△ Less
Submitted 9 October, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
On the Dynamics of Inference and Learning
Authors:
David S. Berman,
Jonathan J. Heckman,
Marc Klinger
Abstract:
Statistical Inference is the process of determining a probability distribution over the space of parameters of a model given a data set. As more data becomes available this probability distribution becomes updated via the application of Bayes' theorem. We present a treatment of this Bayesian updating process as a continuous dynamical system. Statistical inference is then governed by a first order…
▽ More
Statistical Inference is the process of determining a probability distribution over the space of parameters of a model given a data set. As more data becomes available this probability distribution becomes updated via the application of Bayes' theorem. We present a treatment of this Bayesian updating process as a continuous dynamical system. Statistical inference is then governed by a first order differential equation describing a trajectory or flow in the information geometry determined by a parametric family of models. We solve this equation for some simple models and show that when the Cramér-Rao bound is saturated the learning rate is governed by a simple $1/T$ power-law, with $T$ a time-like variable denoting the quantity of data. The presence of hidden variables can be incorporated in this setting, leading to an additional driving term in the resulting flow equation. We illustrate this with both analytic and numerical examples based on Gaussians and Gaussian Random Processes and inference of the coupling constant in the 1D Ising model. Finally we compare the qualitative behaviour exhibited by Bayesian flows to the training of various neural networks on benchmarked data sets such as MNIST and CIFAR10 and show how that for networks exhibiting small final losses the simple power-law is also satisfied.
△ Less
Submitted 19 April, 2022;
originally announced April 2022.
-
A three-stage magnetic phase transition revealed in ultrahigh-quality van der Waals magnet CrSBr
Authors:
Wenhao Liu,
Xiaoyu Guo,
Jonathan Schwartz,
Hongchao Xie,
Nikhil Dhale,
Suk Hyun Sung,
Aswin L. N. Kondusamy,
Xiqu Wang,
Haonan Zhao,
Diana Berman,
Robert Hovden,
Liuyan Zhao,
Bing Lv
Abstract:
van der Waals (vdW) magnets are receiving ever-growing attention nowadays due to their significance in both fundamental research on low-dimensional magnetism and potential applications in spintronic devices. High crystalline quality of vdW magnets is key for maintaining intrinsic magnetic and electronic properties, especially when exfoliated down to the 2D limit. Here, ultrahigh-quality air-stable…
▽ More
van der Waals (vdW) magnets are receiving ever-growing attention nowadays due to their significance in both fundamental research on low-dimensional magnetism and potential applications in spintronic devices. High crystalline quality of vdW magnets is key for maintaining intrinsic magnetic and electronic properties, especially when exfoliated down to the 2D limit. Here, ultrahigh-quality air-stable vdW CrSBr crystals are synthesized using the direct vapor-solid synthesis method. The high single crystallinity and spatial homogeneity have been thoroughly evidenced at length scales from sub-mm to atomic resolution by X-ray diffraction, second harmonic generation, and scanning transmission electron microscopy. More importantly, specific heat measurements of these ultrahigh quality CrSBr crystals show three thermodynamic anomalies at 185K, 156K, and 132K, revealing a stage-by-stage development of the magnetic order upon cooling, which is also corroborated with the magnetization and transport results. Our ultrahigh-quality CrSBr can further be exfoliated down to monolayers and bilayers easily, paving the way to integrate them into heterostructures for spintronic and magneto-optoelectronic applications.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
Canted Antiferromagnetism in the Quasi-1D Iron Chalcogenide BaFe$_{2}$Se$_{4}$
Authors:
Xiaoyuan Liu,
Keith M. Taddei,
Sheng Li,
Wenhao Liu,
Nikhil Dhale,
Rashad Kadado,
Diana Berman,
Clarina Dela Cruz,
Bing Lv
Abstract:
We report the synthesis and physical properties studies of quais-1D iron chalcogenide $\rm BaFe_2Se_4$ which shares the $\rm FeSe_4$ tetrahedra building motif commonly seen in the iron chalcogenide superconductors. A high-quality polycrystalline sample was achieved by solid-state reaction method and characterized by X-ray diffraction, electrical resistivity, magnetic susceptibility and neutron dif…
▽ More
We report the synthesis and physical properties studies of quais-1D iron chalcogenide $\rm BaFe_2Se_4$ which shares the $\rm FeSe_4$ tetrahedra building motif commonly seen in the iron chalcogenide superconductors. A high-quality polycrystalline sample was achieved by solid-state reaction method and characterized by X-ray diffraction, electrical resistivity, magnetic susceptibility and neutron diffraction measurements. $\rm BaFe_2Se_4$ is a narrow gap semiconductor that magnetically orders at $\sim$ 310 K. Both neutron powder diffraction results and isothermal M-H loops suggest a canted antiferromagnetic structure where Fe sublattice are antiferromagnetically ordered along the c-axis quasi-1D chain direction, resulting in a net ferromagnetic moment in the perpendicular direction along the a-axis with tilted angle of 18.7$^\circ$ towards the b-axis.
△ Less
Submitted 20 August, 2020;
originally announced August 2020.
-
Singular dynamics in the failure of soft adhesive contacts
Authors:
Justin D. Berman,
Manjari Randeria,
Robert W. Style,
Qin Xu,
James R. Nichols,
Aidan J. Duncan,
Michael Loewenberg,
Eric R. Dufresne,
Katharine E. Jensen
Abstract:
We characterize the mechanical recovery of compliant silicone gels following adhesive contact failure. We establish broad, stable adhesive contacts between rigid microspheres and soft gels, then stretch the gels to large deformations by pulling quasi-statically on the contact. Eventually, the adhesive contact begins to fail, and ultimately slides to a final contact point on the bottom of the spher…
▽ More
We characterize the mechanical recovery of compliant silicone gels following adhesive contact failure. We establish broad, stable adhesive contacts between rigid microspheres and soft gels, then stretch the gels to large deformations by pulling quasi-statically on the contact. Eventually, the adhesive contact begins to fail, and ultimately slides to a final contact point on the bottom of the sphere. Immediately after detachment, the gel recoils quickly with a self-similar surface profile that evolves as a power law in time, suggesting that the adhesive detachment point is singular. The singular dynamics we observe are consistent with a relaxation process driven by surface stress and slowed by viscous flow through the porous, elastic network of the gel. Our results emphasize the importance of accounting for both the liquid and solid phases of gels in understanding their mechanics, especially under extreme deformation.
△ Less
Submitted 11 October, 2018;
originally announced October 2018.
-
Bit Patterned Magnetic Recording: Theory, Media Fabrication, and Recording Performance
Authors:
Thomas R. Albrecht,
Hitesh Arora,
Vipin Ayanoor-Vitikkate,
Jean-Marc Beaujour,
Daniel Bedau,
David Berman,
Alexei L. Bogdanov,
Yves-Andre Chapuis,
Julia Cushen,
Elizabeth E. Dobisz,
Gregory Doerk,
He Gao,
Michael Grobis,
Bruce Gurney,
Weldon Hanson,
Olav Hellwig,
Toshiki Hirano,
Pierre-Olivier Jubert,
Dan Kercher,
Jeffrey Lille,
Zuwei Liu,
C. Mathew Mate,
Yuri Obukhov,
Kanaiyalal C. Patel,
Kurt Rubin
, et al. (6 additional authors not shown)
Abstract:
Bit Patterned Media (BPM) for magnetic recording provide a route to densities $>1 Tb/in^2$ and circumvents many of the challenges associated with conventional granular media technology. Instead of recording a bit on an ensemble of random grains, BPM uses an array of lithographically defined isolated magnetic islands, each of which stores one bit. Fabrication of BPM is viewed as the greatest challe…
▽ More
Bit Patterned Media (BPM) for magnetic recording provide a route to densities $>1 Tb/in^2$ and circumvents many of the challenges associated with conventional granular media technology. Instead of recording a bit on an ensemble of random grains, BPM uses an array of lithographically defined isolated magnetic islands, each of which stores one bit. Fabrication of BPM is viewed as the greatest challenge for its commercialization. In this article we describe a BPM fabrication method which combines e-beam lithography, directed self-assembly of block copolymers, self-aligned double patterning, nanoimprint lithography, and ion milling to generate BPM based on CoCrPt alloys. This combination of fabrication technologies achieves feature sizes of $<10 nm$, significantly smaller than what conventional semiconductor nanofabrication methods can achieve. In contrast to earlier work which used hexagonal close-packed arrays of round islands, our latest approach creates BPM with rectangular bitcells, which are advantageous for integration with existing hard disk drive technology. The advantages of rectangular bits are analyzed from a theoretical and modeling point of view, and system integration requirements such as servo patterns, implementation of write synchronization, and providing for a stable head-disk interface are addressed in the context of experimental results. Optimization of magnetic alloy materials for thermal stability, writeability, and switching field distribution is discussed, and a new method for growing BPM islands on a patterned template is presented. New recording results at $1.6 Td/in^2$ (teradot/inch${}^2$, roughly equivalent to $1.3 Tb/in^2$) demonstrate a raw error rate $<10^{-2}$, which is consistent with the recording system requirements of modern hard drives. Extendibility of BPM to higher densities, and its eventual combination with energy assisted recording are explored.
△ Less
Submitted 19 March, 2015;
originally announced March 2015.
-
Spin polarization oscillations without spin precession: spin-orbit entangled resonances in quasi-one-dimensional spin transport
Authors:
D. H. Berman,
M. Khodas,
M. E. Flatté
Abstract:
Resonant behavior involving spin-orbit entangled states occurs for spin transport along a narrow channel defined in a two-dimensional electron gas, including an apparent rapid relaxation of the spin polarization for special values of the channel width and applied magnetic field (so-called ballistic spin resonance). A fully quantum mechanical theory for transport through multiple subbands of the on…
▽ More
Resonant behavior involving spin-orbit entangled states occurs for spin transport along a narrow channel defined in a two-dimensional electron gas, including an apparent rapid relaxation of the spin polarization for special values of the channel width and applied magnetic field (so-called ballistic spin resonance). A fully quantum mechanical theory for transport through multiple subbands of the one-dimensional system provides the dependence of the spin transport on the applied magnetic field and channel width, including a resonant depolarization of spins when the Zeeman energy matches the subband energy splittings and a spin texture transverse to the magnetic field. The resonance phenomenon is robust to disorder.
△ Less
Submitted 17 September, 2013;
originally announced September 2013.
-
Electron beam formation from spin-orbit interactions in zincblende semiconductor quantum wells
Authors:
David H. Berman,
Michael E. Flatté
Abstract:
We find a dramatic enhancement of electron propagation along a narrow range of real-space angles from an isotropic source in a two-dimensional quantum well made from a zincblende semiconductor. This ``electron beam'' formation is caused by the interplay between spin-orbit interaction originating from a perpendicular electric field to the quantum well and the intrinsic spin-orbit field of the zincb…
▽ More
We find a dramatic enhancement of electron propagation along a narrow range of real-space angles from an isotropic source in a two-dimensional quantum well made from a zincblende semiconductor. This ``electron beam'' formation is caused by the interplay between spin-orbit interaction originating from a perpendicular electric field to the quantum well and the intrinsic spin-orbit field of the zincblende crystal lattice in a quantum well, in situations where the two fields are different in strength but of the same order of magnitude. Beam formation is associated with caustics and can be described semi-classically using a stationary phase analysis.
△ Less
Submitted 7 September, 2010;
originally announced September 2010.
-
Single electron capacitance spectroscopy of vertical quantum dots using a single electron transistor
Authors:
M. Koltonyuk,
D. Berman,
N. B. Zhitenev,
R. C. Ashoori,
N. Pfeiffer,
K. W. West
Abstract:
We have incorporated an aluminum single electron transistor (SET) directly on top of a vertical quantum dot, enabling the use of the SET as an electrometer that is extremely responsive to the motion of charge into and out of the dot. Charge induced on the SET central island from single electron additions to the dot modulates the SET output, and we describe two methods for demodulation that permi…
▽ More
We have incorporated an aluminum single electron transistor (SET) directly on top of a vertical quantum dot, enabling the use of the SET as an electrometer that is extremely responsive to the motion of charge into and out of the dot. Charge induced on the SET central island from single electron additions to the dot modulates the SET output, and we describe two methods for demodulation that permit quantitative extraction of the quantum dot capacitance signal. The two methods produce closely similar results for the determined single electron capacitance peaks.
△ Less
Submitted 12 June, 1998; v1 submitted 29 May, 1998;
originally announced May 1998.
-
Observation of Quantum Fluctuations of Charge on a Quantum Dot
Authors:
D. Berman,
N. B. Zhitenev,
R. C. Ashoori,
M. Shayegan
Abstract:
We have incorporated an aluminum single electron transistor directly into the defining gate structure of a semiconductor quantum dot, permitting precise measurement of the charge in the dot. Voltage biasing a gate draws charge from a reservoir into the dot through a single point contact. The charge in the dot increases continuously for large point contact conductance and in a step-like manner in…
▽ More
We have incorporated an aluminum single electron transistor directly into the defining gate structure of a semiconductor quantum dot, permitting precise measurement of the charge in the dot. Voltage biasing a gate draws charge from a reservoir into the dot through a single point contact. The charge in the dot increases continuously for large point contact conductance and in a step-like manner in units of single electrons with the contact nearly closed. We measure the corresponding capacitance lineshapes for the full range of point contact conductances. The lineshapes are described well by perturbation theory and not by theories in which the dot charging energy is altered by the barrier conductance.
△ Less
Submitted 6 April, 1998; v1 submitted 30 March, 1998;
originally announced March 1998.