Skip to main content

Showing 1–5 of 5 results for author: Koster, U

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.07657  [pdf, other

    cs.LG cs.AI stat.AP stat.ME

    Scalable Spatiotemporal Prediction with Bayesian Neural Fields

    Authors: Feras Saad, Jacob Burnim, Colin Carroll, Brian Patton, Urs Köster, Rif A. Saurous, Matthew Hoffman

    Abstract: Spatiotemporal datasets, which consist of spatially-referenced time series, are ubiquitous in diverse applications, such as air pollution monitoring, disease tracking, and cloud-demand forecasting. As the scale of modern datasets increases, there is a growing need for statistical methods that are flexible enough to capture complex spatiotemporal dynamics and scalable enough to handle many observat… ▽ More

    Submitted 26 November, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: 29 pages, 7 figures, 2 tables, 1 listing

    Journal ref: Nature Communications 15(7942), 2024

  2. arXiv:2007.01397  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Adaptive Braking for Mitigating Gradient Delay

    Authors: Abhinav Venigalla, Atli Kosson, Vitaliy Chiley, Urs Köster

    Abstract: Neural network training is commonly accelerated by using multiple synchronized workers to compute gradient updates in parallel. Asynchronous methods remove synchronization overheads and improve hardware utilization at the cost of introducing gradient delay, which impedes optimization and can lead to lower final model performance. We introduce Adaptive Braking (AB), a modification for momentum-base… ▽ More

    Submitted 10 July, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

    Comments: In Beyond First Order Methods in ML Systems workshop at the 37th International Conference on Machine Learning, 2020

  3. arXiv:2003.11666  [pdf, other

    cs.LG cs.DC stat.ML

    Pipelined Backpropagation at Scale: Training Large Models without Batches

    Authors: Atli Kosson, Vitaliy Chiley, Abhinav Venigalla, Joel Hestness, Urs Köster

    Abstract: New hardware can substantially increase the speed and efficiency of deep neural network training. To guide the development of future hardware architectures, it is pertinent to explore the hardware and machine learning properties of alternative training algorithms. In this work we evaluate the use of small batch, fine-grained Pipelined Backpropagation, an asynchronous pipeline parallel training alg… ▽ More

    Submitted 9 April, 2021; v1 submitted 25 March, 2020; originally announced March 2020.

    Comments: Proceedings of the 4th MLSys Conference, 2021

  4. arXiv:1905.05894  [pdf, other

    cs.LG stat.ML

    Online Normalization for Training Neural Networks

    Authors: Vitaliy Chiley, Ilya Sharapov, Atli Kosson, Urs Koster, Ryan Reece, Sofia Samaniego de la Fuente, Vishal Subbiah, Michael James

    Abstract: Online Normalization is a new technique for normalizing the hidden activations of a neural network. Like Batch Normalization, it normalizes the sample dimension. While Online Normalization does not use batches, it is as accurate as Batch Normalization. We resolve a theoretical limitation of Batch Normalization by introducing an unbiased technique for computing the gradient of normalized activation… ▽ More

    Submitted 3 December, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

    Comments: Published at the Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. Code: https://github.com/Cerebras/online-normalization

  5. arXiv:1711.02213  [pdf, other

    cs.LG math.NA stat.ML

    Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks

    Authors: Urs Köster, Tristan J. Webb, Xin Wang, Marcel Nassar, Arjun K. Bansal, William H. Constable, Oğuz H. Elibol, Scott Gray, Stewart Hall, Luke Hornof, Amir Khosrowshahi, Carey Kloss, Ruby J. Pai, Naveen Rao

    Abstract: Deep neural networks are commonly developed and trained in 32-bit floating point format. Significant gains in performance and energy efficiency could be realized by training and inference in numerical formats optimized for deep learning. Despite advances in limited precision inference in recent years, training of neural networks in low bit-width remains a challenging problem. Here we present the F… ▽ More

    Submitted 2 December, 2017; v1 submitted 6 November, 2017; originally announced November 2017.

    Comments: 14 pages, 5 figures, accepted in Neural Information Processing Systems 2017