Skip to main content

Showing 1–5 of 5 results for author: Balanca, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.17791  [pdf, ps, other

    cs.LG

    Elucidating the Design Space of FP4 training

    Authors: Robert Hu, Carlo Luschi, Paul Balanca

    Abstract: The increasing computational demands of foundation models have spurred research into low-precision training, with 4-bit floating-point (\texttt{FP4}) formats emerging as a frontier for maximizing hardware throughput. While numerous techniques have been proposed to stabilize \texttt{FP4} training, they often present isolated solutions with varying, and not always clear, computational overheads. Thi… ▽ More

    Submitted 22 September, 2025; originally announced September 2025.

  2. arXiv:2407.17353  [pdf, other

    cs.LG

    Scalify: scale propagation for efficient low-precision LLM training

    Authors: Paul Balança, Sam Hosegood, Carlo Luschi, Andrew Fitzgibbon

    Abstract: Low-precision formats such as float8 have been introduced in machine learning accelerated hardware to improve computational efficiency for large language models training and inference. Nevertheless, adoption by the ML community has been slowed down by the complex, and sometimes brittle, techniques required to match higher precision training accuracy. In this work, we present Scalify, a end-to-end… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 11 pages, 5 figures, ICML 2024 WANT workshop

    MSC Class: 68T07 ACM Class: I.2.7

  3. arXiv:2402.04030  [pdf, other

    cs.LG

    Reducing the Cost of Quantum Chemical Data By Backpropagating Through Density Functional Theory

    Authors: Alexander Mathiasen, Hatem Helal, Paul Balanca, Adam Krzywaniak, Ali Parviz, Frederik Hvilshøj, Blazej Banaszewski, Carlo Luschi, Andrew William Fitzgibbon

    Abstract: Density Functional Theory (DFT) accurately predicts the quantum chemical properties of molecules, but scales as $O(N_{\text{electrons}}^3)$. Schütt et al. (2019) successfully approximate DFT 1000x faster with Neural Networks (NN). Arguably, the biggest problem one faces when scaling to larger molecules is the cost of DFT labels. For example, it took years to create the PCQ dataset (Nakata & Shimaz… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  4. arXiv:2311.01135  [pdf, other

    cs.LG physics.chem-ph

    Generating QM1B with PySCF$_{\text{IPU}}$

    Authors: Alexander Mathiasen, Hatem Helal, Kerstin Klaser, Paul Balanca, Josef Dean, Carlo Luschi, Dominique Beaini, Andrew Fitzgibbon, Dominic Masters

    Abstract: The emergence of foundation models in Computer Vision and Natural Language Processing have resulted in immense progress on downstream tasks. This progress was enabled by datasets with billions of training examples. Similar benefits are yet to be unlocked for quantum chemistry, where the potential of deep learning is constrained by comparatively small datasets with 100k to 20M training examples. Th… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: 15 pages, 7 figures. NeurIPS 2023 Track Datasets and Benchmarks

    ACM Class: I.2.6; J.2

  5. arXiv:2309.17224  [pdf, other

    cs.LG cs.AR cs.CL cs.ET cs.PF

    Training and inference of large language models using 8-bit floating point

    Authors: Sergio P. Perez, Yan Zhang, James Briggs, Charlie Blake, Josh Levy-Kramer, Paul Balanca, Carlo Luschi, Stephen Barlow, Andrew William Fitzgibbon

    Abstract: FP8 formats are gaining popularity to boost the computational efficiency for training and inference of large deep learning models. Their main challenge is that a careful choice of scaling is needed to prevent degradation due to the reduced dynamic range compared to higher-precision formats. Although there exists ample literature about selecting such scalings for INT formats, this critical aspect h… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    ACM Class: I.2.7; B.2.4