-
A Two-Phase Perspective on Deep Learning Dynamics
Authors:
Robert de Mello Koch,
Animik Ghosh
Abstract:
We propose that learning in deep neural networks proceeds in two phases: a rapid curve fitting phase followed by a slower compression or coarse graining phase. This view is supported by the shared temporal structure of three phenomena: grokking, double descent and the information bottleneck, all of which exhibit a delayed onset of generalization well after training error reaches zero. We empirical…
▽ More
We propose that learning in deep neural networks proceeds in two phases: a rapid curve fitting phase followed by a slower compression or coarse graining phase. This view is supported by the shared temporal structure of three phenomena: grokking, double descent and the information bottleneck, all of which exhibit a delayed onset of generalization well after training error reaches zero. We empirically show that the associated timescales align in two rather different settings. Mutual information between hidden layers and input data emerges as a natural progress measure, complementing circuit-based metrics such as local complexity and the linear mapping number. We argue that the second phase is not actively optimized by standard training algorithms and may be unnecessarily prolonged. Drawing on an analogy with the renormalization group, we suggest that this compression phase reflects a principled form of forgetting, critical for generalization.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Quantum Boltzmann machine learning of ground-state energies
Authors:
Dhrumil Patel,
Daniel Koch,
Saahil Patel,
Mark M. Wilde
Abstract:
Estimating the ground-state energy of Hamiltonians is a fundamental task for which it is believed that quantum computers can be helpful. Several approaches have been proposed toward this goal, including algorithms based on quantum phase estimation and hybrid quantum-classical optimizers involving parameterized quantum circuits, the latter falling under the umbrella of the variational quantum eigen…
▽ More
Estimating the ground-state energy of Hamiltonians is a fundamental task for which it is believed that quantum computers can be helpful. Several approaches have been proposed toward this goal, including algorithms based on quantum phase estimation and hybrid quantum-classical optimizers involving parameterized quantum circuits, the latter falling under the umbrella of the variational quantum eigensolver. Here, we analyze the performance of quantum Boltzmann machines for this task, which is a less explored ansatz based on parameterized thermal states and which is not known to suffer from the barren-plateau problem. We delineate a hybrid quantum-classical algorithm for this task and rigorously prove that it converges to an $\varepsilon$-approximate stationary point of the energy function optimized over parameter space, while using a number of parameterized-thermal-state samples that is polynomial in $\varepsilon^{-1}$, the number of parameters, and the norm of the Hamiltonian being optimized. Our algorithm estimates the gradient of the energy function efficiently by means of a novel quantum circuit construction that combines classical sampling, Hamiltonian simulation, and the Hadamard test, thus overcoming a key obstacle to quantum Boltzmann machine learning that has been left open since [Amin et al., Phys. Rev. X 8, 021050 (2018)]. Additionally supporting our main claims are calculations of the gradient and Hessian of the energy function, as well as an upper bound on the matrix elements of the latter that is used in the convergence analysis.
△ Less
Submitted 30 October, 2024; v1 submitted 16 October, 2024;
originally announced October 2024.
-
MinePlanner: A Benchmark for Long-Horizon Planning in Large Minecraft Worlds
Authors:
William Hill,
Ireton Liu,
Anita De Mello Koch,
Damion Harvey,
Nishanth Kumar,
George Konidaris,
Steven James
Abstract:
We propose a new benchmark for planning tasks based on the Minecraft game. Our benchmark contains 45 tasks overall, but also provides support for creating both propositional and numeric instances of new Minecraft tasks automatically. We benchmark numeric and propositional planning systems on these tasks, with results demonstrating that state-of-the-art planners are currently incapable of dealing w…
▽ More
We propose a new benchmark for planning tasks based on the Minecraft game. Our benchmark contains 45 tasks overall, but also provides support for creating both propositional and numeric instances of new Minecraft tasks automatically. We benchmark numeric and propositional planning systems on these tasks, with results demonstrating that state-of-the-art planners are currently incapable of dealing with many of the challenges advanced by our new benchmark, such as scaling to instances with thousands of objects. Based on these results, we identify areas of improvement for future planners. Our framework is made available at https://github.com/IretonLiu/mine-pddl/.
△ Less
Submitted 28 April, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Finite difference method in prolate spheroidal coordinates for freely suspended spheroidal particles in linear flows of viscous and viscoelastic fluids
Authors:
Arjun Sharma,
Donald L. Koch
Abstract:
A finite difference scheme is used to develop a numerical method to solve the flow of an unbounded viscoelastic fluid with zero to moderate inertia around a prolate spheroidal particle. The equations are written in prolate spheroidal coordinates, and the shape of the particle is exactly resolved as one of the coordinate surfaces representing the inner boundary of the computational domain. As the p…
▽ More
A finite difference scheme is used to develop a numerical method to solve the flow of an unbounded viscoelastic fluid with zero to moderate inertia around a prolate spheroidal particle. The equations are written in prolate spheroidal coordinates, and the shape of the particle is exactly resolved as one of the coordinate surfaces representing the inner boundary of the computational domain. As the prolate spheroidal grid is naturally clustered near the particle surface, good resolution is obtained in the regions where the gradients of relevant flow variables are most significant. This coordinate system also allows large domain sizes with a reasonable number of mesh points to simulate unbounded fluid around a particle. Changing the aspect ratio of the inner computational boundary enables simulations of different particle shapes ranging from a sphere to a slender fiber. Numerical studies of the latter particle shape allow testing of slender body theories. The mass and momentum equations are solved with a Schur complement approach allowing us to solve the zero inertia case necessary to isolate the viscoelastic effects. The singularities associated with the coordinate system are overcome using L'Hopital's rule. A straightforward imposition of conditions representing a time-varying combination of linear flows on the outer boundary allows us to study various flows with the same computational domain geometry. {For the special but important case of zero fluid and particle inertia we obtain a novel formulation that satisfies the force- and torque-free constraint in an iteration-free manner.} The numerical method is demonstrated for various flows of Newtonian and viscoelastic fluids around spheres and spheroids (including those with large aspect ratio). Good agreement is demonstrated with existing theoretical and numerical results.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Why Unsupervised Deep Networks Generalize
Authors:
Anita de Mello Koch,
Ellen de Mello Koch,
Robert de Mello Koch
Abstract:
Promising resolutions of the generalization puzzle observe that the actual number of parameters in a deep network is much smaller than naive estimates suggest. The renormalization group is a compelling example of a problem which has very few parameters, despite the fact that naive estimates suggest otherwise. Our central hypothesis is that the mechanisms behind the renormalization group are also a…
▽ More
Promising resolutions of the generalization puzzle observe that the actual number of parameters in a deep network is much smaller than naive estimates suggest. The renormalization group is a compelling example of a problem which has very few parameters, despite the fact that naive estimates suggest otherwise. Our central hypothesis is that the mechanisms behind the renormalization group are also at work in deep learning, and that this leads to a resolution of the generalization puzzle. We show detailed quantitative evidence that proves the hypothesis for an RBM, by showing that the trained RBM is discarding high momentum modes. Specializing attention mainly to autoencoders, we give an algorithm to determine the network's parameters directly from the learning data set. The resulting autoencoder almost performs as well as one trained by deep learning, and it provides an excellent initial condition for training, reducing training times by a factor between 4 and 100 for the experiments we considered. Further, we are able to suggest a simple criterion to decide if a given problem can or can not be solved using a deep network.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
Short sighted deep learning
Authors:
Ellen de Melllo Koch,
Anita de Mello Koch,
Nicholas Kastanos,
Ling Cheng
Abstract:
A theory explaining how deep learning works is yet to be developed. Previous work suggests that deep learning performs a coarse graining, similar in spirit to the renormalization group (RG). This idea has been explored in the setting of a local (nearest neighbor interactions) Ising spin lattice. We extend the discussion to the setting of a long range spin lattice. Markov Chain Monte Carlo (MCMC) s…
▽ More
A theory explaining how deep learning works is yet to be developed. Previous work suggests that deep learning performs a coarse graining, similar in spirit to the renormalization group (RG). This idea has been explored in the setting of a local (nearest neighbor interactions) Ising spin lattice. We extend the discussion to the setting of a long range spin lattice. Markov Chain Monte Carlo (MCMC) simulations determine both the critical temperature and scaling dimensions of the system. The model is used to train both a single RBM (restricted Boltzmann machine) network, as well as a stacked RBM network. Following earlier Ising model studies, the trained weights of a single layer RBM network define a flow of lattice models. In contrast to results for nearest neighbor Ising, the RBM flow for the long ranged model does not converge to the correct values for the spin and energy scaling dimension. Further, correlation functions between visible and hidden nodes exhibit key differences between the stacked RBM and RG flows. The stacked RBM flow appears to move towards low temperatures whereas the RG flow moves towards high temperature. This again differs from results obtained for nearest neighbor Ising.
△ Less
Submitted 7 February, 2020;
originally announced February 2020.
-
FOS: A Modular FPGA Operating System for Dynamic Workloads
Authors:
Anuj Vaishnav,
Khoa Dang Pham,
Joseph Powell,
Dirk Koch
Abstract:
With FPGAs now being deployed in the cloud and at the edge, there is a need for scalable design methods which can incorporate the heterogeneity present in the hardware and software components of FPGA systems. Moreover, these FPGA systems need to be maintainable and adaptable to changing workloads while improving accessibility for the application developers. However, current FPGA systems fail to ac…
▽ More
With FPGAs now being deployed in the cloud and at the edge, there is a need for scalable design methods which can incorporate the heterogeneity present in the hardware and software components of FPGA systems. Moreover, these FPGA systems need to be maintainable and adaptable to changing workloads while improving accessibility for the application developers. However, current FPGA systems fail to achieve modularity and support for multi-tenancy due to dependencies between system components and lack of standardised abstraction layers. To solve this, we introduce a modular FPGA operating system -- FOS, which adopts a modular FPGA development flow to allow each system component to be changed and be agnostic to the heterogeneity of EDA tool versions, hardware and software layers. Further, to dynamically maximise the utilisation transparently from the users, FOS employs resource-elastic scheduling to arbitrate the FPGA resources in both time and spatial domain for any type of accelerators. Our evaluation on different FPGA boards shows that FOS can provide performance improvements in both single-tenant and multi-tenant environments while substantially reducing the development time and, at the same time, improving flexibility.
△ Less
Submitted 26 January, 2020;
originally announced January 2020.
-
Is Deep Learning a Renormalization Group Flow?
Authors:
Ellen de Mello Koch,
Robert de Mello Koch,
Ling Cheng
Abstract:
Although there has been a rapid development of practical applications, theoretical explanations of deep learning are in their infancy. Deep learning performs a sophisticated coarse graining. Since coarse graining is a key ingredient of the renormalization group (RG), RG may provide a useful theoretical framework directly relevant to deep learning. In this study we pursue this possibility. A statis…
▽ More
Although there has been a rapid development of practical applications, theoretical explanations of deep learning are in their infancy. Deep learning performs a sophisticated coarse graining. Since coarse graining is a key ingredient of the renormalization group (RG), RG may provide a useful theoretical framework directly relevant to deep learning. In this study we pursue this possibility. A statistical mechanics model for a magnet, the Ising model, is used to train an unsupervised restricted Boltzmann machine (RBM). The patterns generated by the trained RBM are compared to the configurations generated through an RG treatment of the Ising model. Although we are motivated by the connection between deep learning and RG flow, in this study we focus mainly on comparing a single layer of a deep network to a single step in the RG flow. We argue that correlation functions between hidden and visible neurons are capable of diagnosing RG-like coarse graining. Numerical experiments show the presence of RG-like patterns in correlators computed using the trained RBMs. The observables we consider are also able to exhibit important differences between RG and deep learning.
△ Less
Submitted 10 June, 2020; v1 submitted 12 June, 2019;
originally announced June 2019.
-
Visual Estimation of Building Condition with Patch-level ConvNets
Authors:
David Koch,
Miroslav Despotovic,
Muntaha Sakeena,
Mario Döller,
Matthias Zeppelzauer
Abstract:
The condition of a building is an important factor for real estate valuation. Currently, the estimation of condition is determined by real estate appraisers which makes it subjective to a certain degree. We propose a novel vision-based approach for the assessment of the building condition from exterior views of the building. To this end, we develop a multi-scale patch-based pattern extraction appr…
▽ More
The condition of a building is an important factor for real estate valuation. Currently, the estimation of condition is determined by real estate appraisers which makes it subjective to a certain degree. We propose a novel vision-based approach for the assessment of the building condition from exterior views of the building. To this end, we develop a multi-scale patch-based pattern extraction approach and combine it with convolutional neural networks to estimate building condition from visual clues. Our evaluation shows that visually estimated building condition can serve as a proxy for condition estimates by appraisers.
△ Less
Submitted 26 April, 2018;
originally announced April 2018.
-
Automatic Prediction of Building Age from Photographs
Authors:
Matthias Zeppelzauer,
Miroslav Despotovic,
Muntaha Sakeena,
David Koch,
Mario Döller
Abstract:
We present a first method for the automated age estimation of buildings from unconstrained photographs. To this end, we propose a two-stage approach that firstly learns characteristic visual patterns for different building epochs at patch-level and then globally aggregates patch-level age estimates over the building. We compile evaluation datasets from different sources and perform an detailed eva…
▽ More
We present a first method for the automated age estimation of buildings from unconstrained photographs. To this end, we propose a two-stage approach that firstly learns characteristic visual patterns for different building epochs at patch-level and then globally aggregates patch-level age estimates over the building. We compile evaluation datasets from different sources and perform an detailed evaluation of our approach, its sensitivity to parameters, and the capabilities of the employed deep networks to learn characteristic visual age-related patterns. Results show that our approach is able to estimate building age at a surprisingly high level that even outperforms human evaluators and thereby sets a new performance baseline. This work represents a first step towards the automated assessment of building parameters for automated price prediction.
△ Less
Submitted 19 April, 2018; v1 submitted 6 April, 2018;
originally announced April 2018.
-
Proceedings of the Second International Workshop on FPGAs for Software Programmers (FSP 2015)
Authors:
Frank Hannig,
Dirk Koch,
Daniel Ziener
Abstract:
This volume contains the papers accepted at the Second International Workshop on FPGAs for Software Programmers (FSP 2015), held in London, United Kingdom, September 1st, 2015. FSP 2015 was co-located with the International Conference on Field Programmable Logic and Applications (FPL).
This volume contains the papers accepted at the Second International Workshop on FPGAs for Software Programmers (FSP 2015), held in London, United Kingdom, September 1st, 2015. FSP 2015 was co-located with the International Conference on Field Programmable Logic and Applications (FPL).
△ Less
Submitted 25 August, 2015;
originally announced August 2015.
-
Proceedings of the First International Workshop on FPGAs for Software Programmers (FSP 2014)
Authors:
Frank Hannig,
Dirk Koch,
Daniel Ziener
Abstract:
This volume contains the papers accepted at the First International Workshop on FPGAs for Software Programmers (FSP 2014), held in Munich, Germany, September 1st, 2014. FSP 2014 was co-located with the International Conference on Field Programmable Logic and Applications (FPL).
This volume contains the papers accepted at the First International Workshop on FPGAs for Software Programmers (FSP 2014), held in Munich, Germany, September 1st, 2014. FSP 2014 was co-located with the International Conference on Field Programmable Logic and Applications (FPL).
△ Less
Submitted 27 February, 2015; v1 submitted 18 August, 2014;
originally announced August 2014.
-
No-Break Dynamic Defragmentation of Reconfigurable Devices
Authors:
Sandor Fekete,
Tom Kamphans,
Nils Schweer,
Christopher Tessars,
Jan C. van der Veen,
Josef Angermeier,
Dirk Koch,
Juergen Teich
Abstract:
We propose a new method for defragmenting the module layout of a reconfigurable device, enabled by a novel approach for dealing with communication needs between relocated modules and with inhomogeneities found in commonly used FPGAs. Our method is based on dynamic relocation of module positions during runtime, with only very little reconfiguration overhead; the objective is to maximize the length…
▽ More
We propose a new method for defragmenting the module layout of a reconfigurable device, enabled by a novel approach for dealing with communication needs between relocated modules and with inhomogeneities found in commonly used FPGAs. Our method is based on dynamic relocation of module positions during runtime, with only very little reconfiguration overhead; the objective is to maximize the length of contiguous free space that is available for new modules. We describe a number of algorithmic aspects of good defragmentation, and present an optimization method based on tabu search. Experimental results indicate that we can improve the quality of module layout by roughly 50 % over static layout. Among other benefits, this improvement avoids unnecessary rejections of modules
△ Less
Submitted 7 November, 2011; v1 submitted 23 December, 2010;
originally announced December 2010.