Floating Point Numbers Representation

Made by Mike_Zhang


Computer System article:
Signed binary numbers representation
Floating point numbers representation
UltraFish Plus - Signed binary number convertor
UltraFish Plus - Floating Point Numbers Representation Convertor
UltraFish Plus - Multiple Bases Unsigned Integer Convertor
Y86-64 Learning 1-State & Instruction & Basic Encoding
Y86-64 Learning 2-Y86-64 SEQ Stages
x86-64 Learning 1-Introduction & Data Formats & Information Accessing & Arithmetic Logical Operation
x86-64 Learning 2-Control


I have talked about the Signed binary number representation in a previous article. Then I am going to record the Floating point numbers (FPN) representation, including Floating point numbers normalization, hidden bit, FPN representation in the computer, and IEEE 754 Standard.

Floating point numbers normalization

Usually, a floating point number can be represented by 3 parts: Sign, Exponent, and Fraction (aka Significand or Mantissa).
Example:

+1.23 * 10^45

“+” is the Sign
45 is the Exponent
1.23 is the Fracrion
10 is the Base

For a given number, the location of radix point is fixed. It is located immediately to the left OR right of the leftmost AND nonzero digit in the fraction.
Example:

+1.23 10^45 OR +0.123 10^46

FPN can be normalized in any base.


Hidden bit

Hidden bit is aopptional choise for FPN normalization.
When the fraction is in binary, there will always be a leftmost “1” in the normalized fraction, because the point is always located immediately to the left OR right of the leftmost AND nonzero digit in the fraction.
Example:

111.01 2^3 = 1.1101 2^5 = 0.11101 * 2^6

So the computer does not have to store this “1”, which is hidden bit or hidden 1.
If the Hidden bit is used in the FPN normalization, for 1.1101, the computer only needs to store 1101 for the normalized fraction.


FPN representation in the computer

In a computer, the FPN representation is specified in:

  1. The base of the original number;
  2. The location of radix point in normalization: left OR right of the leftmost AND nonzero digit in the fraction;
  3. The number of bits of exponent and fraction storage;
  4. the representation of exponent, e.g. Excess, One’s Complement.

Example:

A computer requires the order that:

  1. Sign bit;
  2. 3-bit Excess 4 exponent;
  3. Base 16, 3-bit hexadecimal fraction;

and

  1. The point is located immediately to the left of the leftmost and nonzero digit in the fraction.

For example, the number is +120.0 (base 10).
step1. Convert +120.0 to base 16, 120.0(base 10) = 78.0(base 16);
step2. Normalization: 78.0 = 0.780 * 16^2;
step3. Sign bit: “+” -> 0;
step4. Convert fraction into base2: 780 (base 16) = 0111 1000 0000 (base 2);
step5. Exponent in Excess 4: 2 + 4 = 6 (base 10) = 110 (base 2);
step6. Combine them together: 0 110 0111 1000 0000 (Sign bit | Exponent | Fraction).


IEEE 754 Standard

IEEE stands for Institute of Electrical and Electronics Engineers. IEEE 754 is a standard for Floating point numbers representation.
IEEE 754 has two formats: single precision and double precision. Single precision has 32 bits, and double precision has 64 bits.
IEEE 754 single precision standard:

  1. In base 2;
  2. Using Hidden bit;
  3. Sign bit: 0 for positive, 1 for negative (1 bit);
  4. Using 8-bit Excess 127 exponent (8 bits);
  5. The point is located immediately to the right of the leftmost and nonzero digit in the fraction (23 bits with 1 bit hidden);

Example:

The number is +120.0 (base 10);

  1. convert to base 2: 1111000;
  2. Normalization: +1.111 * 2^6;
  3. Sign bit: 0;
  4. Exponent: 6 + 127 = 133(base 10) = 1000 0101 (base 2);
  5. Fraction: 111 0000 0000 0000 0000 0000 (with hidden 1);
  6. Combination: 0 1000 0101 111 0000 0000 0000 0000 0000 (Sign bit | Exponent | Fraction).

Useful IEEE 754 website:
http://weitz.de/ieee/
UltraFish Plus - Floating point numbers convertor (IEEE 754 single precision)

Special IEEE 754 bit patterns

+0: 0 0000 0000 000 0000 0000 0000 0000 0000;
-0: 1 0000 0000 000 0000 0000 0000 0000 0000;
+(-)INFINITY: 0(1) 1111 1111 000 0000 0000 0000 0000 0000;
+(-)NaN (Not a Number): 0(1) 1111 1111 001 0000 0110 0001 0010 0000 (The fraction is nonzero);
+2^(-128): 0 1111 1111 010 0000 0000 0000 0000 0000;


References

IEEE 754 Calculator: http://weitz.de/ieee/


The End

I have recorded Floating point numbers normalization, hidden bit, FPN representation in the computer, and IEEE 754 Standard in this article.
Please feel free to leave your comments, if you have any questions or find any mistakes. Thanks for reading.


Original article, please indicate the source when sharing
Made by Mike_Zhang




Thanks for your support(WeChat QR Code)

Floating Point Numbers Representation
https://ultrafish.io/post/floating-point-numbers-representation/
Author
Mike_Zhang
Posted on
December 20, 2020
Licensed under