-
Singularities in Bayesian Inference: Crucial or Overstated?
Authors:
Maria De Iorio,
Andreas Heinecke,
Beatrice Franzolini,
Rafael Cabral
Abstract:
Over the past two decades, shrinkage priors have become increasingly popular, and many proposals can be found in the literature. These priors aim to shrink small effects to zero while maintaining true large effects. Horseshoe-type priors have been particularly successful in various applications, mainly due to their computational advantages. However, there is no clear guidance on choosing the most…
▽ More
Over the past two decades, shrinkage priors have become increasingly popular, and many proposals can be found in the literature. These priors aim to shrink small effects to zero while maintaining true large effects. Horseshoe-type priors have been particularly successful in various applications, mainly due to their computational advantages. However, there is no clear guidance on choosing the most appropriate prior for a specific setting. In this work, we propose a framework that encompasses a large class of shrinkage distributions, including priors with and without a singularity at zero. By reframing such priors in the context of reliability theory and wealth distributions, we provide insights into the prior parameters and shrinkage properties. The paper's key contributions are based on studying the folded version of such distributions, which we refer to as the Gambel distribution. The Gambel can be rewritten as the ratio between a Generalised Gamma and a Generalised Beta of the second kind. This representation allows us to gain insights into the behaviours near the origin and along the tails, compute measures to compare their distributional properties, derive consistency results, devise MCMC schemes for posterior inference and ultimately provide guidance on the choice of the hyperparameters.
△ Less
Submitted 11 January, 2025;
originally announced January 2025.
-
Leveraging the bfloat16 Artificial Intelligence Datatype For Higher-Precision Computations
Authors:
Greg Henry,
Ping Tak Peter Tang,
Alexander Heinecke
Abstract:
In recent years fused-multiply-add (FMA) units with lower-precision multiplications and higher-precision accumulation have proven useful in machine learning/artificial intelligence applications, most notably in training deep neural networks due to their extreme computational intensity. Compared to classical IEEE-754 32 bit (FP32) and 64 bit (FP64) arithmetic, these reduced precision arithmetic can…
▽ More
In recent years fused-multiply-add (FMA) units with lower-precision multiplications and higher-precision accumulation have proven useful in machine learning/artificial intelligence applications, most notably in training deep neural networks due to their extreme computational intensity. Compared to classical IEEE-754 32 bit (FP32) and 64 bit (FP64) arithmetic, these reduced precision arithmetic can naturally be sped up disproportional to their shortened width. The common strategy of all major hardware vendors is to aggressively further enhance their performance disproportionately. One particular FMA operation that multiplies two BF16 numbers while accumulating in FP32 has been found useful in deep learning, where BF16 is the 16-bit floating point datatype with IEEE FP32 numerical range but 8 significant bits of precision. In this paper, we examine the use this FMA unit to implement higher-precision matrix routines in terms of potential performance gain and implications on accuracy. We demonstrate how a decomposition into multiple smaller datatypes can be used to assemble a high-precision result, leveraging the higher precision accumulation of the FMA unit. We first demonstrate that computations of vector inner products and by natural extension, matrix-matrix products can be achieved by decomposing FP32 numbers in several BF16 numbers followed by appropriate computations that can accommodate the dynamic range and preserve accuracy compared to standard FP32 computations, while projecting up to 5.2x speed-up. Furthermore, we examine solution of linear equations formulated in the residual form that allows for iterative refinement. We demonstrate that the solution obtained to be comparable to those offered by FP64 under a large range of linear system condition numbers.
△ Less
Submitted 12 April, 2019;
originally announced April 2019.
-
Mixed Precision Training of Convolutional Neural Networks using Integer Operations
Authors:
Dipankar Das,
Naveen Mellempudi,
Dheevatsa Mudigere,
Dhiraj Kalamkar,
Sasikanth Avancha,
Kunal Banerjee,
Srinivas Sridharan,
Karthik Vaidyanathan,
Bharat Kaul,
Evangelos Georganas,
Alexander Heinecke,
Pradeep Dubey,
Jesus Corbal,
Nikita Shustrov,
Roma Dubtsov,
Evarist Fomenko,
Vadim Pirogov
Abstract:
The state-of-the-art (SOTA) for mixed precision training is dominated by variants of low precision floating point operations, and in particular, FP16 accumulating into FP32 Micikevicius et al. (2017). On the other hand, while a lot of research has also happened in the domain of low and mixed-precision Integer training, these works either present results for non-SOTA networks (for instance only Ale…
▽ More
The state-of-the-art (SOTA) for mixed precision training is dominated by variants of low precision floating point operations, and in particular, FP16 accumulating into FP32 Micikevicius et al. (2017). On the other hand, while a lot of research has also happened in the domain of low and mixed-precision Integer training, these works either present results for non-SOTA networks (for instance only AlexNet for ImageNet-1K), or relatively small datasets (like CIFAR-10). In this work, we train state-of-the-art visual understanding neural networks on the ImageNet-1K dataset, with Integer operations on General Purpose (GP) hardware. In particular, we focus on Integer Fused-Multiply-and-Accumulate (FMA) operations which take two pairs of INT16 operands and accumulate results into an INT32 output.We propose a shared exponent representation of tensors and develop a Dynamic Fixed Point (DFP) scheme suitable for common neural network operations. The nuances of developing an efficient integer convolution kernel is examined, including methods to handle overflow of the INT32 accumulator. We implement CNN training for ResNet-50, GoogLeNet-v1, VGG-16 and AlexNet; and these networks achieve or exceed SOTA accuracy within the same number of iterations as their FP32 counterparts without any change in hyper-parameters and with a 1.8X improvement in end-to-end training throughput. To the best of our knowledge these results represent the first INT16 training results on GP hardware for ImageNet-1K dataset using SOTA CNNs and achieve highest reported accuracy using half-precision
△ Less
Submitted 23 February, 2018; v1 submitted 3 February, 2018;
originally announced February 2018.
-
Necessary and sufficient conditions to perform Spectral Tetris
Authors:
Peter Casazza,
Andreas Heinecke,
Keri Kornelson,
Yang Wang,
Zhengfang Zhou
Abstract:
Spectral Tetris has proved to be a powerful tool for constructing sparse equal norm Hilbert space frames. We introduce a new form of Spectral Tetris which works for non-equal norm frames. It is known that this method cannot construct all frames --- even in the new case introduced here. Until now, it has been a mystery as to why Spectral Tetris sometimes works and sometimes fails. We will give a co…
▽ More
Spectral Tetris has proved to be a powerful tool for constructing sparse equal norm Hilbert space frames. We introduce a new form of Spectral Tetris which works for non-equal norm frames. It is known that this method cannot construct all frames --- even in the new case introduced here. Until now, it has been a mystery as to why Spectral Tetris sometimes works and sometimes fails. We will give a complete answer to this mystery by giving necessary and sufficient conditions for Spectral Tetris to construct frames in all cases including equal norm frames, prescribed norm frames, frames with constant spectrum of the frame operator, and frames with prescribed spectrum for the frame operator. We present a variety of examples as well as special cases where Spectral Tetris always works.
△ Less
Submitted 15 April, 2012;
originally announced April 2012.
-
Spectral Tetris Fusion Frame Constructions
Authors:
Peter G. Casazza,
Matthew Fickus,
Andreas Heinecke,
Yang Wang,
Zhengfang Zhou
Abstract:
Spectral tetris is a fexible and elementary method to construct unit norm frames with a given frame operator, having all of its eigenvalues greater than or equal to two. One important application of spectral tetris is the construction of fusion frames. We first show how the assumption on the spectrum of the frame operator can be dropped and extend the spectral tetris algorithm to construct unit no…
▽ More
Spectral tetris is a fexible and elementary method to construct unit norm frames with a given frame operator, having all of its eigenvalues greater than or equal to two. One important application of spectral tetris is the construction of fusion frames. We first show how the assumption on the spectrum of the frame operator can be dropped and extend the spectral tetris algorithm to construct unit norm frames with any given spectrum of the frame operator. We then provide a suffcient condition for using this generalization of spectral tetris to construct fusion frames with prescribed spectrum for the fusion frame operator and with prescribed dimensions for the subspaces. This condition is shown to be necessary in the tight case of redundancy greater than two.
△ Less
Submitted 9 January, 2012; v1 submitted 19 August, 2011;
originally announced August 2011.
-
Optimally Sparse Frames
Authors:
Peter G. Casazza,
Andreas Heinecke,
Felix Krahmer,
Gitta Kutyniok
Abstract:
Frames have established themselves as a means to derive redundant, yet stable decompositions of a signal for analysis or transmission, while also promoting sparse expansions. However, when the signal dimension is large, the computation of the frame measurements of a signal typically requires a large number of additions and multiplications, and this makes a frame decomposition intractable in applic…
▽ More
Frames have established themselves as a means to derive redundant, yet stable decompositions of a signal for analysis or transmission, while also promoting sparse expansions. However, when the signal dimension is large, the computation of the frame measurements of a signal typically requires a large number of additions and multiplications, and this makes a frame decomposition intractable in applications with limited computing budget. To address this problem, in this paper, we focus on frames in finite-dimensional Hilbert spaces and introduce sparsity for such frames as a new paradigm. In our terminology, a sparse frame is a frame whose elements have a sparse representation in an orthonormal basis, thereby enabling low-complexity frame decompositions. To introduce a precise meaning of optimality, we take the sum of the numbers of vectors needed of this orthonormal basis when expanding each frame vector as sparsity measure. We then analyze the recently introduced algorithm Spectral Tetris for construction of unit norm tight frames and prove that the tight frames generated by this algorithm are in fact optimally sparse with respect to the standard unit vector basis. Finally, we show that even the generalization of Spectral Tetris for the construction of unit norm frames associated with a given frame operator produces optimally sparse frames.
△ Less
Submitted 28 June, 2011; v1 submitted 19 September, 2010;
originally announced September 2010.
-
A quantitative notion of redundancy for infinite frames
Authors:
Jameson Cahill,
Peter G. Casazza,
Andreas Heinecke
Abstract:
Bodmann, Casazza and Kutyniok introduced a quantitative notion of redundancy for finite frames - which they called {\em upper and lower redundancies} - that match better with an intuitive understanding of redundancy for finite frames in a Hilbert space. The objective of this paper is to see how much of this theory generalizes to infinite frames.
Bodmann, Casazza and Kutyniok introduced a quantitative notion of redundancy for finite frames - which they called {\em upper and lower redundancies} - that match better with an intuitive understanding of redundancy for finite frames in a Hilbert space. The objective of this paper is to see how much of this theory generalizes to infinite frames.
△ Less
Submitted 14 June, 2010;
originally announced June 2010.
-
Fusion Frames: Existence and Construction
Authors:
Robert Calderbank,
Peter G. Casazza,
Andreas Heinecke,
Gitta Kutyniok,
Ali Pezeshki
Abstract:
Fusion frame theory is an emerging mathematical theory that provides a natural framework for performing hierarchical data processing. A fusion frame is a frame-like collection of subspaces in a Hilbert space, thereby generalizing the concept of a frame for signal representation. In this paper, we study the existence and construction of fusion frames. We first present a complete characterization…
▽ More
Fusion frame theory is an emerging mathematical theory that provides a natural framework for performing hierarchical data processing. A fusion frame is a frame-like collection of subspaces in a Hilbert space, thereby generalizing the concept of a frame for signal representation. In this paper, we study the existence and construction of fusion frames. We first present a complete characterization of a special class of fusion frames, called Parseval fusion frames. The value of Parseval fusion frames is that the inverse fusion frame operator is equal to the identity and therefore signal reconstruction can be performed with minimal complexity. We then introduce two general methods -- the spatial complement and the Naimark complement -- for constructing a new fusion frame from a given fusion frame. We then establish existence conditions for fusion frames with desired properties. In particular, we address the following question: Given $M, N, m \in \NN$ and $\{λ_j\}_{j=1}^M$, does there exist a fusion frame in $\RR^M$ with $N$ subspaces of dimension $m$ for which $\{λ_j\}_{j=1}^M$ are the eigenvalues of the associated fusion frame operator? We address this problem by providing an algorithm which computes such a fusion frame for almost any collection of parameters $M, N, m \in \NN$ and $\{λ_j\}_{j=1}^M$. Moreover, we show how this procedure can be applied, if subspaces are to be added to a given fusion frame to force it to become Parseval.
△ Less
Submitted 30 June, 2009;
originally announced June 2009.