Skip to main content

Showing 1–11 of 11 results for author: Nagarakatte, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.07409  [pdf, ps, other

    cs.MS

    RLibm-MultiRound: Correctly Rounded Math Libraries Without Worrying about the Application's Rounding Mode

    Authors: Sehyeok Park, Justin Kim, Santosh Nagarakatte

    Abstract: Our RLibm project generates a single implementation for an elementary function that produces correctly rounded results for multiple rounding modes and representations with up to 32-bits. They are appealing for developing fast reference libraries without double rounding issues. The key insight is to build polynomials that produce the correctly rounded result for a representation with two additional… ▽ More

    Submitted 29 May, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

    Comments: 31 pages

    Report number: Rutgers Department of Computer Science Technical Report DCS-TR-759

  2. arXiv:2312.15640  [pdf, other

    cs.DC cs.SE

    Report of the DOE/NSF Workshop on Correctness in Scientific Computing, June 2023, Orlando, FL

    Authors: Maya Gokhale, Ganesh Gopalakrishnan, Jackson Mayo, Santosh Nagarakatte, Cindy Rubio-González, Stephen F. Siegel

    Abstract: This report is a digest of the DOE/NSF Workshop on Correctness in Scientific Computing (CSC'23) held on June 17, 2023, as part of the Federated Computing Research Conference (FCRC) 2023. CSC was conceived by DOE and NSF to address the growing concerns about correctness among those who employ computational methods to perform large-scale scientific simulations. These concerns have escalated, given t… ▽ More

    Submitted 27 December, 2023; v1 submitted 25 December, 2023; originally announced December 2023.

    Comments: 36 pages. DOE/NSF Workshop on Correctness in Scientific Computing (CSC 2023) was a PLDI 2023 workshop

    ACM Class: B.8.1; C.1.4; D.0.3; D.0.4; D.1.3; D.2.1; D.2.5; D.3.1; G.1.2; J.2

  3. arXiv:2111.12852  [pdf, other

    cs.MS

    RLIBM-PROG: Progressive Polynomial Approximations for Fast Correctly Rounded Math Libraries

    Authors: Mridul Aanjaneya, Jay P. Lim, Santosh Nagarakatte

    Abstract: This paper presents a novel method for generating a single polynomial approximation that produces correctly rounded results for all inputs of an elementary function for multiple representations. The generated polynomial approximation has the nice property that the first few lower degree terms produce correctly rounded results for specific representations of smaller bitwidths, which we call progres… ▽ More

    Submitted 17 March, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

    Comments: 14 pages

    Report number: Rutgers Department of Computer Science Technical Report DCS-TR-758

  4. arXiv:2108.06756  [pdf, other

    cs.MS

    RLIBM-ALL: A Novel Polynomial Approximation Method to Produce Correctly Rounded Results for Multiple Representations and Rounding Modes

    Authors: Jay P. Lim, Santosh Nagarakatte

    Abstract: Mainstream math libraries for floating point (FP) do not produce correctly rounded results for all inputs. In contrast, CR-LIBM and RLIBM provide correctly rounded implementations for a specific FP representation with one rounding mode. Using such libraries for a representation with a new rounding mode or with different precision will result in wrong results due to double rounding. This paper prop… ▽ More

    Submitted 29 November, 2021; v1 submitted 15 August, 2021; originally announced August 2021.

    Comments: 28 pages

    Report number: Rutgers Department of Computer Science Technical Report DCS-TR-757

  5. arXiv:2107.13386  [pdf, other

    cs.AR

    SPOTS: An Accelerator for Sparse Convolutional Networks Leveraging Systolic General Matrix-Matrix Multiplication

    Authors: Mohammadreza Soltaniyeh, Richard P. Martin, Santosh Nagarakatte

    Abstract: This paper proposes a new hardware accelerator for sparse convolutional neural networks (CNNs) by building a hardware unit to perform the Image to Column (IM2COL) transformation of the input feature map coupled with a systolic array-based general matrix-matrix multiplication (GEMM) unit. Our design carefully overlaps the IM2COL transformation with the GEMM computation to maximize parallelism. We p… ▽ More

    Submitted 24 November, 2021; v1 submitted 28 July, 2021; originally announced July 2021.

    Comments: 24 pages

    Report number: Rutgers Department of Computer Science Technical Report DCS-TR-756

  6. arXiv:2105.05398  [pdf, other

    cs.PL

    Sound, Precise, and Fast Abstract Interpretation with Tristate Numbers

    Authors: Harishankar Vishwanathan, Matan Shachnai, Srinivas Narayana, Santosh Nagarakatte

    Abstract: Extended Berkeley Packet Filter (BPF) is a language and run-time system that allows non-superusers to extend the Linux and Windows operating systems by downloading user code into the kernel. To ensure that user code is safe to run in kernel context, BPF relies on a static analyzer that proves properties about the code, such as bounded memory access and the absence of operations that crash. The BPF… ▽ More

    Submitted 15 December, 2021; v1 submitted 11 May, 2021; originally announced May 2021.

    Comments: 20 pages

    Report number: Rutgers Department of Computer Science Technical Report DCS-TR-755 (Extended version of the CGO-2022 paper)

  7. arXiv:2104.04043  [pdf, other

    cs.MS

    RLIBM-32: High Performance Correctly Rounded Math Libraries for 32-bit Floating Point Representations

    Authors: Jay P. Lim, Santosh Nagarakatte

    Abstract: This paper proposes a set of techniques to develop correctly rounded math libraries for 32-bit float and posit types. It enhances our RLibm approach that frames the problem of generating correctly rounded libraries as a linear programming problem in the context of 16-bit types to scale to 32-bit types. Specifically, this paper proposes new algorithms to (1) generate polynomials that produce correc… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: 23 pages

    Report number: Rutgers Department of Computer Science Technical Report DCS-TR-754

  8. arXiv:2007.05344  [pdf, other

    cs.MS

    A Novel Approach to Generate Correctly Rounded Math Libraries for New Floating Point Representations

    Authors: Jay P. Lim, Mridul Aanjaneya, John Gustafson, Santosh Nagarakatte

    Abstract: Given the importance of floating-point~(FP) performance in numerous domains, several new variants of FP and its alternatives have been proposed (e.g., Bfloat16, TensorFloat32, and Posits). These representations do not have correctly rounded math libraries. Further, the use of existing FP libraries for these new representations can produce incorrect results. This paper proposes a novel approach for… ▽ More

    Submitted 20 November, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: 44 pages

    Report number: Rutgers DCS Technical Report 753

  9. arXiv:2004.13907  [pdf, other

    cs.DC cs.MS cs.PL

    Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra

    Authors: Mohammadreza Soltaniyeh, Richard P. Martin, Santosh Nagarakatte

    Abstract: This paper describes REAP, a software-hardware approach that enables high performance sparse linear algebra computations on a cooperative CPU-FPGA platform. REAP carefully separates the task of organizing the matrix elements from the computation phase. It uses the CPU to provide a first-pass re-organization of the matrix elements, allowing the FPGA to focus on the computation. We introduce a new i… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

    Comments: 12 pages

    Report number: Rutgers Computer Science Technical Report DCS-TR-750

  10. arXiv:1705.01522  [pdf, other

    cs.PL cs.DC

    A Fast Causal Profiler for Task Parallel Programs

    Authors: Adarsh Yoga, Santosh Nagarakatte

    Abstract: This paper proposes TASKPROF, a profiler that identifies parallelism bottlenecks in task parallel programs. It leverages the structure of a task parallel execution to perform fine-grained attribution of work to various parts of the program. TASKPROF's use of hardware performance counters to perform fine-grained measurements minimizes perturbation. TASKPROF's profile execution runs in parallel usin… ▽ More

    Submitted 2 July, 2017; v1 submitted 3 May, 2017; originally announced May 2017.

    Comments: 11 pages

    Report number: Rutgers CS Technical Report: DCS-TR-728

  11. arXiv:1611.05980  [pdf, other

    cs.PL

    Precondition Inference for Peephole Optimizations in LLVM

    Authors: David Menendez, Santosh Nagarakatte

    Abstract: Peephole optimizations are a common source of compiler bugs. Compiler developers typically transform an incorrect peephole optimization into a valid one by strengthening the precondition. This process is challenging and tedious. This paper proposes ALIVE-INFER, a data-driven approach that infers preconditions for peephole optimizations expressed in Alive. ALIVE-INFER generates positive and negativ… ▽ More

    Submitted 24 March, 2017; v1 submitted 18 November, 2016; originally announced November 2016.

    Comments: 15 pages

    Report number: Rutgers CS Technical Report: DCS-TR-727