-
Mixed-precision numerics in scientific applications: survey and perspectives
Authors:
Aditya Kashi,
Hao Lu,
Wesley Brewer,
David Rogers,
Michael Matheson,
Mallikarjun Shankar,
Feiyi Wang
Abstract:
The explosive demand for artificial intelligence (AI) workloads has led to a significant increase in silicon area dedicated to lower-precision computations on recent high-performance computing hardware designs. However, mixed-precision capabilities, which can achieve performance improvements of 8x compared to double-precision in extreme compute-intensive workloads, remain largely untapped in most…
▽ More
The explosive demand for artificial intelligence (AI) workloads has led to a significant increase in silicon area dedicated to lower-precision computations on recent high-performance computing hardware designs. However, mixed-precision capabilities, which can achieve performance improvements of 8x compared to double-precision in extreme compute-intensive workloads, remain largely untapped in most scientific applications. A growing number of efforts have shown that mixed-precision algorithmic innovations can deliver superior performance without sacrificing accuracy. These developments should prompt computational scientists to seriously consider whether their scientific modeling and simulation applications could benefit from the acceleration offered by new hardware and mixed-precision algorithms. In this article, we review the literature on relevant applications, existing mixed-precision algorithms, theories, and the available software infrastructure. We then offer our perspective and recommendations on the potential of mixed-precision algorithms to enhance the performance of scientific simulation applications. Broadly, we find that mixed-precision methods can have a large impact on computational science in terms of time-to-solution and energy consumption. This is true not only for a few arithmetic-dominated applications but also, to a more moderate extent, to the many memory bandwidth-bound applications. In many cases, though, the choice of algorithms and regions of applicability will be domain-specific, and thus require input from domain experts. It is helpful to identify cross-cutting computational motifs and their mixed-precision algorithms in this regard. Finally, there are new algorithms being developed to utilize AI hardware and and AI methods to accelerate first-principles computational science, and these should be closely watched as hardware platforms evolve.
△ Less
Submitted 7 January, 2025; v1 submitted 26 December, 2024;
originally announced December 2024.
-
Zero Coordinate Shift: Whetted Automatic Differentiation for Physics-informed Operator Learning
Authors:
Kuangdai Leng,
Mallikarjun Shankar,
Jeyan Thiyagalingam
Abstract:
Automatic differentiation (AD) is a critical step in physics-informed machine learning, required for computing the high-order derivatives of network output w.r.t. coordinates of collocation points. In this paper, we present a novel and lightweight algorithm to conduct AD for physics-informed operator learning, which we call the trick of Zero Coordinate Shift (ZCS). Instead of making all sampled co…
▽ More
Automatic differentiation (AD) is a critical step in physics-informed machine learning, required for computing the high-order derivatives of network output w.r.t. coordinates of collocation points. In this paper, we present a novel and lightweight algorithm to conduct AD for physics-informed operator learning, which we call the trick of Zero Coordinate Shift (ZCS). Instead of making all sampled coordinates as leaf variables, ZCS introduces only one scalar-valued leaf variable for each spatial or temporal dimension, simplifying the wanted derivatives from "many-roots-many-leaves" to "one-root-many-leaves" whereby reverse-mode AD becomes directly utilisable. It has led to an outstanding performance leap by avoiding the duplication of the computational graph along the dimension of functions (physical parameters). ZCS is easy to implement with current deep learning libraries; our own implementation is achieved by extending the DeepXDE package. We carry out a comprehensive benchmark analysis and several case studies, training physics-informed DeepONets to solve partial differential equations (PDEs) without data. The results show that ZCS has persistently reduced GPU memory consumption and wall time for training by an order of magnitude, and such reduction factor scales with the number of functions. As a low-level optimisation technique, ZCS imposes no restrictions on data, physics (PDE) or network architecture and does not compromise training results from any aspect.
△ Less
Submitted 14 March, 2024; v1 submitted 1 November, 2023;
originally announced November 2023.
-
A Dimensionally-Reduced Nonlinear Elasticity Model for Liquid Crystal Elastomer Strips with Transverse Curvature
Authors:
Kevin LoGrande,
M. Ravi Shankar,
Kaushik Dayal
Abstract:
Liquid Crystalline Elastomers (LCEs) are active materials that are of interest due to their programmable response to various external stimuli such as light and heat. When exposed to these stimuli, the anisotropy in the response of the material is governed by the nematic director, which is a continuum parameter that is defined as the average local orientation of the mesogens in the liquid crystal p…
▽ More
Liquid Crystalline Elastomers (LCEs) are active materials that are of interest due to their programmable response to various external stimuli such as light and heat. When exposed to these stimuli, the anisotropy in the response of the material is governed by the nematic director, which is a continuum parameter that is defined as the average local orientation of the mesogens in the liquid crystal phase. This nematic director can be programmed to be heterogeneous in space, creating a vast design space that is useful for applications ranging from artificial ligaments to deployable structures to self-assembling mechanisms. Even when specialized to long and thin strips of LCEs -- the focus of this work -- the vast design space has required the use of numerical simulations to aid in experimental discovery. To mitigate the computational expense of full 3-d numerical simulations, several dimensionally-reduced rod and ribbon models have been developed for LCE strips, but these have not accounted for the possibility of initial transverse curvature, like carpenter's tape spring. Motivated by recent experiments showing that transversely-curved LCE strips display a rich variety of configurations, this work derives a dimensionally-reduced 1-d model for pre-curved LCE strips. The 1-d model is validated against full 3-d finite element calculations, and it is also shown to capture experimental observations, including tape-spring-like localizations, in activated LCE strips.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
On the Performance of One-Bit DoA Estimation via Sparse Linear Arrays
Authors:
Saeid Sedighi,
M. R. Bhavani Shankar,
Mojtaba Soltanalian,
Björn Ottersten
Abstract:
Direction of Arrival (DoA) estimation using Sparse Linear Arrays (SLAs) has recently gained considerable attention in array processing thanks to their capability to provide enhanced degrees of freedom in resolving uncorrelated source signals. Additionally, deployment of one-bit Analog-to-Digital Converters (ADCs) has emerged as an important topic in array processing, as it offers both a low-cost a…
▽ More
Direction of Arrival (DoA) estimation using Sparse Linear Arrays (SLAs) has recently gained considerable attention in array processing thanks to their capability to provide enhanced degrees of freedom in resolving uncorrelated source signals. Additionally, deployment of one-bit Analog-to-Digital Converters (ADCs) has emerged as an important topic in array processing, as it offers both a low-cost and a low-complexity implementation. In this paper, we study the problem of DoA estimation from one-bit measurements received by an SLA. Specifically, we first investigate the identifiability conditions for the DoA estimation problem from one-bit SLA data and establish an equivalency with the case when DoAs are estimated from infinite-bit unquantized measurements. Towards determining the performance limits of DoA estimation from one-bit quantized data, we derive a pessimistic approximation of the corresponding Cramér-Rao Bound (CRB). This pessimistic CRB is then used as a benchmark for assessing the performance of one-bit DoA estimators. We also propose a new algorithm for estimating DoAs from one-bit quantized data. We investigate the analytical performance of the proposed method through deriving a closed-form expression for the covariance matrix of the asymptotic distribution of the DoA estimation errors and show that it outperforms the existing algorithms in the literature. Numerical simulations are provided to validate the analytical derivations and corroborate the resulting performance improvement.
△ Less
Submitted 20 October, 2021; v1 submitted 27 December, 2020;
originally announced December 2020.
-
Localization with One-Bit Passive Radars in Narrowband Internet-of-Things using Multivariate Polynomial Optimization
Authors:
Saeid Sedighi,
Kumar Vijay Mishra,
M. R. Bhavani Shankar,
Björn Ottersten
Abstract:
Several Internet-of-Things (IoT) applications provide location-based services, wherein it is critical to obtain accurate position estimates by aggregating information from individual sensors. In the recently proposed narrowband IoT (NB-IoT) standard, which trades off bandwidth to gain wide coverage, the location estimation is compounded by the low sampling rate receivers and limited-capacity links…
▽ More
Several Internet-of-Things (IoT) applications provide location-based services, wherein it is critical to obtain accurate position estimates by aggregating information from individual sensors. In the recently proposed narrowband IoT (NB-IoT) standard, which trades off bandwidth to gain wide coverage, the location estimation is compounded by the low sampling rate receivers and limited-capacity links. We address both of these NB-IoT drawbacks in the framework of passive sensing devices that receive signals from the target-of-interest. We consider the limiting case where each node receiver employs one-bit analog-to-digital-converters and propose a novel low-complexity nodal delay estimation method using constrained-weighted least squares minimization. To support the low-capacity links to the fusion center (FC), the range estimates obtained at individual sensors are then converted to one-bit data. At the FC, we propose target localization with the aggregated one-bit range vector using both optimal and sub-optimal techniques. The computationally expensive former approach is based on Lasserre's method for multivariate polynomial optimization while the latter employs our less complex iterative joint r\textit{an}ge-\textit{tar}get location \textit{es}timation (ANTARES) algorithm. Our overall one-bit framework not only complements the low NB-IoT bandwidth but also supports the design goal of inexpensive NB-IoT location sensing. Numerical experiments demonstrate feasibility of the proposed one-bit approach with a $0.6$\% increase in the normalized localization error for the small set of $20$-$60$ nodes over the full-precision case. When the number of nodes is sufficiently large ($>80$), the one-bit methods yield the same performance as the full precision.
△ Less
Submitted 9 April, 2021; v1 submitted 29 July, 2020;
originally announced July 2020.