1.0 Scientific Notation

Scientific notation uses a two part representation for a number which consists of a signed fraction (mantissa) and a signed exponent (characteristic).

An example might be +1.234 x 1023

Numbers written in scientific notation are usually written in a normal form (for example the value of the fraction being written so that it is between 1 and 10). If the result of some calculation is .0001 x 1015, then it must be normalized as

1.000 x 1011

When scientific notation is used to represent inexact values on a computer it is usually referred to as floating point notation.

When graphed, adjacent floating point numbers close to zero are very closely spaced, however, the gaps between adjacent floating point numbers far from zero are more than astronomically large!

2.0 IEEE Floating Point Representations

The Institute for Electrical and Electronics Engineers is an engineering professional society which has been concerned with establishing various standards in the field of electrical engineering, electronics and computer engineering. During the early years of computer design, each manufacturer designed its own method of using scientific notation to represent inexact computational values. Not only were these representations not compatible, but they also used different operational methods to handle conversions and rounding of inexact values. This meant that the same sequence of ineact operations on different computers may often produce different results!

In an effort to try to standardize inexact computation, the IEEE formed a committee in the mid 1980's which produced the IEEE Standard 754 for binary floating point arithmetic. Since then, most computer manufacturers have begun to use this standard for representations which is given, in part, in Section 2.0.1.

2.0.1 Formats

Listed below are three IEEE data types. All three are also SANE (Standard Apple Numeric Environment) data types as well. In fact, Apple's SANE package provides the most complete and accurate implementation of the IEEE standard to date. Each of the diagrams in the following pages is followed by a table that gives the rules for evaluating the number. In each field of each diagram, the leftmost bit is the msb (most significant bit) and the rightmost is the lsb (least significant bit). Symbols used in the diagrams are defined in the following table.

___________________________________________
Symbol   Description
___________________________________________
v        value of the number
s        sign bit
e        biased exponent
i        explicit one's bit (extended type only)
f        fraction

2.0.2 Single

The 32-bit single format is divided into three fields as shown below:

The value v of the number is determined by these fields as shown in the following table:

Values of single-format numbers (32 bits)


___________________________________________________________
e        f         v                           class of v
___________________________________________________________
0<e<255  (any)     v=(-1)s x 2(e-127) x (1.f)   normalized
e=0      f!=0      v=(-1)s x 2(e-126) x (0.f)   denormalized
e=0      f=0       v=(-1)s x 0                 zero
e=255    f=0       v=(-1)s x infinity          infinity
e=255    f!=0      v is a NaN                  NaN

2.0.3 Double

The 64-bit double format is divided into three fields as shown below:

The value v of the number is determined by these fields as shown in the following table:

Values of double-format numbers (64 bits)


___________________________________________________________
e        f         v                           class of v
___________________________________________________________
0<e<2047 (any)     v=(-1)s x 2(e-1023) x (1.f)  normalized
e=0      f!=0      v=(-1)s x 2(e-1022) x (0.f)  denormalized
e=0      f=0       v=(-1)s x 0                 zero
e=2047   f=0       v=(-1)s x infinity          infinity
e=2047   f!=0      v is a NaN                  NaN
For example, the double representation (in hex notation) of 1.5 is

3FF8000000000000
is

3F847AE147AE147A

2.0.4 Extended

The 80-bit extended format is divided into four fields as shown below:

The value v of the number is determined by these fields as shown in the following table:

Values of extended-format numbers (80 bits)


___________________________________________________________
e           i  f    v                            class of v
___________________________________________________________
0<=e<=32766 1 (any) v=(-1)s x 2(e-16383) x (1.f)  normalized
0<=e<=32766 0  f!=0 v=(-1)s x 2(e-16383) x (0.f)  denormalized
0<=e<=32766 0  f=0  v=(-1)s x 0                  zero
e=32767  (any) f=0  v=(-1)s x infinity           infinity
e=32767  (any) f!=0 v is a NaN                   NaN

2.1 Numbers We Cannot Represent

Finally, we need some discussion about the kinds of numbers which can be represented. When we talk about the real numbers from a mathematical point of view, we sometimes classify the reals into two groups; those which are rational and those which are not rational (irrational).

The rationals are those which can be represented in the form

a/b where a and b are integers and b not zero.

An equivalent formulation is that the rationals are those numbers which have a repeating representation using some radix.

The irrational numbers can be characterized as those numbers whose representation in any radix never repeat.

This means that an infinite amount of memory would be required to exactly represent an irrational number. Since this is impossible, we are forced to cut off the representation after a fixed number of digits. As soon as this is done, we are no longer representing the irrational exactly, but rather, we have substituted a rational (which approximates the irrational) whose radix representation repeats in the digit zero.

This means that all computer numeric representations (called machine numbers) are necessarily rational numbers.