3DSoftware.com > Programming > Floating Point
Floating Point Numbers
 
Floating point numbers are a finite subset of real numbers with limited accuracy. Floating point numbers are used to represent real numbers, but not all real numbers can be represented exactly as floating point numbers. Numbers which cannot be represented exactly are approximated. The set of floating point numbers (denoted F) behaves differently than the set of real numbers (denoted R).
 
Floating point numbers are the binary equivalent of scientific notation, and have internal and external formats. The internal formats are for performing arithmetic operations on the numbers. The external formats are for exporting (providing) the numbers to programs and other computers.
 
Historically, floating point numbers were often represented in IEEE format specified by the IEEE standard. The IEEE standard not only provided a bit layout (format) for the numbers, but also specified how arithmetic operations should be performed on the internal representations of those bit layouts. Computers which processed floating point that way were said to comply with the IEEE standard.
 
The internal IEEE format was traditionally used in math coprocessors. In the future, computers will be able to use other formats internally, because of new circuit designs and also because not all IEEE format features are needed for all types of numerical processing.
 
The IEEE external format variants have been adopted by ANSI (the American National Standards Institute) and will continue to be used for data exchange, so it will be useful to know about the IEEE format. We will discuss the IEEE format on the next page of this article. Subsequent pages describe a higher precision floating point format we are developing for our own use.
 
“When a floating point number x is represented in binary … due to finite precision in computer representation, not all numbers, even within the range of the computer, can be represented exactly.”
–  Daoqi Yang, C++ and OO
Numeric Computing
, p.16

“all fractions which have a terminating expansion in binary system will terminate in decimal system also, but the converse is not true.”
–  H.M. Antia, Num. Meth.
for Sci.& Eng.
2/e, p.18

“there is an infinity of floating point numbers that cannot be represented on any computer.”
–  D.M. Capper, Intro. C++
for Scientists…
2/e, p.325

“the set of real numbers between 0 and 1 is not countable.”
–  L. Wasserman, All
of Statistics
, p. 22
 
 
Contents
Page 1:  
2:  
3:  
4:  
5:  
6:  
7:  
8:  
9:  
10:  
Floating Point Numbers (This Page)
Historic Format (ANSI/IEEE)
Wide Floating Point
Exponent
Digit Ordering
Arithmetic and Truncation
Multiplication
Division
Elementary Functions
External Format
 
This is Page 1
 
Next Page:
Page 2
 
 
Copyright © 2008 by 3D Software. All rights reserved.
3D Software, P.O. Box 221190, Sacramento CA 95822 USA
www.3DSoftware.com     Contact us
Thursday, 28-Aug-2008 04:09:12 GMT