Skip to main content

Showing 1–4 of 4 results for author: Dixit, H D

.
  1. arXiv:2405.01741  [pdf, other

    cs.CR cs.AI cs.AR cs.LG

    PVF (Parameter Vulnerability Factor): A Scalable Metric for Understanding AI Vulnerability Against SDCs in Model Parameters

    Authors: Xun Jiao, Fred Lin, Harish D. Dixit, Joel Coburn, Abhinav Pandey, Han Wang, Venkat Ramesh, Jianyu Huang, Wang Xu, Daniel Moore, Sriram Sankar

    Abstract: Reliability of AI systems is a fundamental concern for the successful deployment and widespread adoption of AI technologies. Unfortunately, the escalating complexity and heterogeneity of AI hardware systems make them increasingly susceptible to hardware faults, e.g., silent data corruptions (SDC), that can potentially corrupt model parameters. When this occurs during AI inference/servicing, it can… ▽ More

    Submitted 11 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  2. arXiv:2203.08989  [pdf, other

    cs.AR cs.DC cs.SE

    Detecting silent data corruptions in the wild

    Authors: Harish Dattatraya Dixit, Laura Boyle, Gautham Vunnam, Sneha Pendharkar, Matt Beadon, Sriram Sankar

    Abstract: Silent Errors within hardware devices occur when an internal defect manifests in a part of the circuit which does not have check logic to detect the incorrect circuit operation. The results of such a defect can range from flipping a single bit in a single data value, up to causing the software to execute the wrong instructions. Silent data corruptions (SDC) in hardware impact computational integri… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: 7 pages, 4 figures, 1 table, 31 references

  3. arXiv:2103.00130  [pdf, other

    cs.DC

    Efficient Soft-Error Detection for Low-precision Deep Learning Recommendation Models

    Authors: Sihuan Li, Jianyu Huang, Ping Tak Peter Tang, Daya Khudia, Jongsoo Park, Harish Dattatraya Dixit, Zizhong Chen

    Abstract: Soft error, namely silent corruption of signal or datum in a computer system, cannot be caverlierly ignored as compute and communication density grow exponentially. Soft error detection has been studied in the context of enterprise computing, high-performance computing and more recently in convolutional neural networks related to autonomous driving. Deep learning recommendation systems (DLRMs) hav… ▽ More

    Submitted 27 February, 2021; originally announced March 2021.

  4. arXiv:2102.11245  [pdf, other

    cs.AR cs.DC

    Silent Data Corruptions at Scale

    Authors: Harish Dattatraya Dixit, Sneha Pendharkar, Matt Beadon, Chris Mason, Tejasvi Chakravarthy, Bharath Muthiah, Sriram Sankar

    Abstract: Silent Data Corruption (SDC) can have negative impact on large-scale infrastructure services. SDCs are not captured by error reporting mechanisms within a Central Processing Unit (CPU) and hence are not traceable at the hardware level. However, the data corruptions propagate across the stack and manifest as application-level problems. These types of errors can result in data loss and can require m… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

    Comments: 8 pages, 3 figures, 33 references