3DSoftware.com > Programming > Floating Point > Page 2
Floating Point Numbers  Page 2
 
Historic Format (ANSI/IEEE)
 
During the 1980s, the Institute of Electrical and Electronics Engineers (IEEE) devloped a binary format for floating point numbers in radix 2 (base 2), to be used with electronic circuits that were being developed at the time. That format has been adopted by the American National Standards Institute (ANSI). Numbers that use that format are said to be in ANSI/IEEE format.
 
Basic coverage of the ANSI/IEEE format is provided in many books including: Antia, Numerical Methods for Scientists & Engineers 2/e, p. 24; Yang, Cao, Chung & Morris, Applied Numerical Methods…, p. 28; Chapra & Canale, Numerical Methods for Engineers 5/e, p. 60; Quarteroni, Sacco & Saleri, Numercial Mathematics 2/e, p. 47; Capper, C++ for Scientists, Engineers and Mathematicians 2/e, p. 325; Parker, Algorithms and Data Structures in C++, p. 16; and Parhami, Computer Architecture, p. 173.
 
The ANSI/IEEE format consists of bits packed into a packet of adjacent bytes. The high order bit of the overall packet is a sign bit, the next bits are a biased exponent, and the remaining bits are the significand. That ordering of bits allows for easy machine sorting of numbers.
 
There are two variants of the ANSI/IEEE format. The first variant is that format's single length overall packet, which is 32 bits including a 24-bit significand. The second variant is that format's double length, which is 64 bits including a 53-bit significand. Those two variants are external formats. There is also an internal IEEE variant which is 80 bits. For the bit layouts of these types of numbers, see: AMD64 Architecture Programmer‘s Manual Vol. 1: Application Programming Rev 3.14 (free download), Sec 6.3.2, Fig 6.8.
 
Integer Bit
 
A floating point number is unique only if it is normalized. Bit-wise normalization also assures maximum digit-wise precision: the significand of a normalized number does not have leading zero digits. Therefore, floating point numbers are supposed to be normalized.
 
Valid normalized numbers other than zero always have a non-zero digit as the leading digit of the significand. That will be the only digit to the left (on the whole number side) of the radix point.
 
Note: In decimal (base 10) numbers, the radix point is called a “decimal point”. In the other number systems, including binary (base 2), that point is simply called the radix point.
 
IEEE numbers have binary digits — the only possible non-zero digit is 1 — so the leading digit of the significand is 1 if the number is a valid non-zero number.
 
That leading bit is called the integer bit because it is the only digit to the left of the radix point in a normalized number. All the other bits represent the fraction of the normalized number.
 
The IEEE format keeps track of when a number is zero or invalid, which is when the integer bit is 0 (otherwise the integer bit is 1). Thus, you do not need to store the integer bit to know what it is. If the IEEE number is zero or invalid, you can assume its integer bit is zero, otherwise its integer bit is 1.
 
When the integer bit is not stored, it is a hidden bit. The two external ANSI/IEEE format variants do not store the integer bit (it is a hidden bit in those numbers). The internal IEEE format does store the integer bit, so that arithmetic operations can be performed on the number.
 
Exponent
 
The exponent of an IEEE number is biased, which means it is stored as a positive number and must have a bias subtracted from it to retrieve the actual exponent. For a listing of the bias values, see the AMD manual referenced above.
 
Denormal
 
IEEE numbers are low precision. Only a few bits of storage are used to store an IEEE exponent. Since such few bits are available to represent an exponent, not many exponents can be represented. You will often have non-zero numbers that are too small to be represented in IEEE format, because the unbiased IEEE exponent cannot be small enough.
 
If a number is much smaller than a normal exponent would allow, that number can be converted to zero. If the number is only slightly smaller than IEEE can represent, it can be represented as a denormalized number with degraded precision but still non-zero.
 
When you implement your own floating point numbers, you do not need to support denormalized numbers internally because you can specify a larger range of exponents than IEEE supports.
 
Extremely small values are often only used in intermediate calculations. If you avoid using IEEE numbers for intermediate calculations, you can avoid working with denormals.
 
—  Page 2  —
 « Page 1 Contents Page 3 » 
 
Copyright © 2008 by 3D Software. All rights reserved.
3D Software, P.O. Box 221190, Sacramento CA 95822 USA
www.3DSoftware.com     Contact us
Thursday, 20-Nov-2008 11:57:26 GMT