Skip to main content

Showing 1–4 of 4 results for author: Natesh, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.09072  [pdf, other

    cs.AR cs.LG

    MGS: Markov Greedy Sums for Accurate Low-Bitwidth Floating-Point Accumulation

    Authors: Vikas Natesh, H. T. Kung, David Kong

    Abstract: We offer a novel approach, MGS (Markov Greedy Sums), to improve the accuracy of low-bitwidth floating-point dot products in neural network computations. In conventional 32-bit floating-point summation, adding values with different exponents may lead to loss of precision in the mantissa of the smaller term, which is right-shifted to align with the larger term's exponent. Such shifting (a.k.a. 'swam… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  2. arXiv:2504.09064  [pdf, other

    cs.LG cs.AI

    PQS (Prune, Quantize, and Sort): Low-Bitwidth Accumulation of Dot Products in Neural Network Computations

    Authors: Vikas Natesh, H. T. Kung

    Abstract: We present PQS, which uses three techniques together - Prune, Quantize, and Sort - to achieve low-bitwidth accumulation of dot products in neural network computations. In conventional quantized (e.g., 8-bit) dot products, partial results are accumulated into wide (e.g., 32-bit) accumulators to avoid overflows when accumulating intermediate partial sums. However, such wide accumulators increase mem… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  3. arXiv:2307.03930  [pdf, other

    cs.LG cs.AR cs.PF cs.PL

    Rosko: Row Skipping Outer Products for Sparse Matrix Multiplication Kernels

    Authors: Vikas Natesh, Andrew Sabot, H. T. Kung, Mark Ting

    Abstract: We propose Rosko -- row skipping outer products -- for deriving sparse matrix multiplication (SpMM) kernels in reducing computation and memory access requirements of deep neural networks (DNNs). Rosko allows skipping of entire row computations during program execution with low sparsity-management overheads. We analytically derive sparse CPU kernels that adapt to given hardware characteristics to e… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: Rosko's CPU implementation can be found at https://github.com/vnatesh/Rosko

  4. arXiv:2304.05544  [pdf, other

    cs.LG cs.AR cs.PF cs.PL

    MEMA Runtime Framework: Minimizing External Memory Accesses for TinyML on Microcontrollers

    Authors: Andrew Sabot, Vikas Natesh, H. T. Kung, Wei-Te Ting

    Abstract: We present the MEMA framework for the easy and quick derivation of efficient inference runtimes that minimize external memory accesses for matrix multiplication on TinyML systems. The framework accounts for hardware resource constraints and problem sizes in analytically determining optimized schedules and kernels that minimize memory accesses. MEMA provides a solution to a well-known problem in th… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

    Comments: Accepted as a full paper by the TinyML Research Symposium 2023