Skip to main content

Binary Floating Point

Definition​

The Binary Floating Point Algorithm is a method used for representing real numbers in a binary format. It involves breaking down a real number into its binary representation consisting of a sign bit, an exponent, and a fraction (also known as mantissa). This algorithm ensures efficient storage and arithmetic operations on floating-point numbers.

Computers store floating-point numbers using the IEEE 754 standard, which allows for a wider range of values, including small numbers closer to zero and also utilizes biased exponents to allow for negative exponents.

This standard splits the representation of numbers into parts:

  • sign
  • exponent
  • fraction

Different floating-point formats allocate varying numbers of bits for each part:

  • half-precision (16 bits)
  • single-precision (32 bits)
  • double-precision (64 bits)

Practice​

 function binaryFloatingPointAlgorithm(realNumber):
// Step 1: Extract Sign
if realNumber < 0:
signBit = 1
else:
signBit = 0

// Step 2: Normalize
exponent = calculateExponent(realNumber)
fraction = calculateFraction(realNumber, exponent)

// Step 3: Convert to Binary
signBinary = convertToBinary(signBit)
exponentBinary = convertToBinary(exponent)
fractionBinary = convertToBinary(fraction)

// Step 4: Combine Components
binaryRepresentation = concatenate(signBinary, exponentBinary, fractionBinary)

// Step 5: Output
return binaryRepresentation