On the precision attainable with various floating-point number systems

Brent, Richard P.

Computer Science > Numerical Analysis

arXiv:1004.3374 (cs)

[Submitted on 20 Apr 2010]

Title:On the precision attainable with various floating-point number systems

Authors:Richard P. Brent

View PDF

Abstract:For scientific computations on a digital computer the set of real number is usually approximated by a finite set F of "floating-point" numbers. We compare the numerical accuracy possible with difference choices of F having approximately the same range and requiring the same word length. In particular, we compare different choices of base (or radix) in the usual floating-point systems. The emphasis is on the choice of F, not on the details of the number representation or the arithmetic, but both rounded and truncated arithmetic are considered. Theoretical results are given, and some simulations of typical floating-point computations (forming sums, solving systems of linear equations, finding eigenvalues) are described. If the leading fraction bit of a normalized base 2 number is not stored explicitly (saving a bit), and the criterion is to minimize the mean square roundoff error, then base 2 is best. If unnormalized numbers are allowed, so the first bit must be stored explicitly, then base 4 (or sometimes base 8) is the best of the usual systems.

Comments:	Corrected version of an old paper (predating the IEEE floating point standard). For details see this http URL
Subjects:	Numerical Analysis (math.NA)
MSC classes:	65Y04
ACM classes:	G.1.0
Report number:	Report TR RC 3751, IBM Research, Yorktown Heights, New York (February 1972), 28 pages
Cite as:	arXiv:1004.3374 [cs.NA]
	(or arXiv:1004.3374v1 [cs.NA] for this version)
	https://doi.org/10.48550/arXiv.1004.3374
Journal reference:	IEEE Transactions on Computers C-22 (1973), 601-607

Submission history

From: Richard Brent [view email]
[v1] Tue, 20 Apr 2010 08:17:24 UTC (14 KB)

Computer Science > Numerical Analysis

Title:On the precision attainable with various floating-point number systems

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Numerical Analysis

Title:On the precision attainable with various floating-point number systems

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators