Question 1

'Floating-point can also represent fractions between powers of 2:
0.750=1.5×2−1'

But wait a minute. 1.5 is also a floating-point. So how you represent that? This example does not explain much.

Accepted Answer

Apologies for the confusing explanation. Let's step through how the computer actually represents that number in floating point representation and see if that helps.

Let's look at 0.750:
0.750=1.5×2^−1

The number 1.5 is called either the "mantissa" or the "significand". The number -1 is called the "exponent" (as per normal math term).
The floating point representation of 0.750 in binary needs to include the sign (positive/negative), the mantissa, and the exponent.

Here's the binary:
001111111110

The first bit represents the sign, where 0 is positive.
The next 11 bits represents the exponent -1:
01111111110
That's the decimal number 1022. According to the floating point standard, the exponent is calculated by subtracting 1023 from that value. 1022-1023 is -1, which is indeed the exponent.

The final 52 bits represent the mantissa 1.5:
1000000000000000000000000000000000000000000000000000

That's a 1 followed by 51 zeros. According to the floating point standard, the first bit represents 1/2 (0.5), the second bit represents 1/4, the third bit represents 1/8, etc. The goal is for those bits to be able to represent values between 0 and 1, which then is considered a mantissa between 1 and 2. In this case, there is a 1 in the first bit, so this mantissa is 1.5.

The key thing here is that the mantissa only ever needs to be between 1.0 and 2.0 (excluding 2). If it goes above that, then the exponent can be increased instead.

I find it helpful to see what happens when you change bits in a representation. You can do that at this tool:
https://float.exposed/0x3fe8000000000000
It currently represents 0.75 in 64-bit floating point. Try clicking the 1s or 0s to see what happens when they take on different values.

I also recommend this explanation of floating point representation:
http://fabiensanglard.net/floating_point_visually_explained/

Question 2

Much harder to understand these without videos :/

Accepted Answer

for real

Question 3

Why is a floating point called that way? What is floating about it?

Accepted Answer

Great question. The floating part is the decimal (between the whole part and the fractional part), as floating point representation can both represent very large numbers with a lot of digits before the decimal (like 1292929.1) and very small numbers with a lot of digits after the decimal (like 1.29292929).  Floating point representation can use its 52 bits to represent both the digits in the whole part and the digits in the fractional part. A contrasting type of representation is "fixed point", which always uses a certain number of bits to represent the whole part and a certain number of bits to represent the fractional part.
Here's a nice longer explanation: https://stackoverflow.com/questions/7524838/fixed-point-vs-floating-point-number

Question 4

I realized there is a pattern in binary numbers
00000
00001
00010
00011
00100
00101
00110
00111
01000
.....
the last row goes 010101010101010......
the second last row goes 00110011001100110011........
the third last row goes 000011110000111100001111.......
the fourth last row goes 000000001111111100000000111111110000000011111111.....
the fifth last row goes 00000000000000001111111111111111000000000000000011111111111111110000000000000000111111111111111100000000000000001111111111111111......
and so on

Accepted Answer

Nice observation!

If you look at the period between changing phases (1 -> 0 or 0 -> 1), you'll see you get exactly 2^i which corresponds to the ith row's contribution to the binary representation.

For instance, the third row changes every 2^2 = 4 cycles. Hence, the 3rd position (starting from 0) contributes 4 to the binary representation.

Question 5

I didn't understand how floating point representation number like 0.375 in wrtten in binary.It's getting very confusing for me!
Somebody help me out..

Accepted Answer

There are 3 parts of a floating-point representation (using the IEEE-754 standard). The first bit is used to determine the sign of the number, 0 is positive and 1 is negative. The next section is the exponent of the number represented in scientific form with an added bias. The final section of the representation is the mantissa of the number in scientific form after dropping the leading 1.

Suppose we want to convert 0.375 to its floating-point representation.

0.375 = 0.011 = 1.1 * 2^(-2)

sign bit = 0 since 0.375 is positive

exponent = bias + original exponent = 1023 + (-2) = 1021 = 01111111101

mantissa = number after the leading 1 = 1000000000000000000000000000000000000000000000000000

floating-point representation = sign bit, exponent, mantissa = 0011111111011000000000000000000000000000000000000000000000000000

(In this example, the floating-point representation is in double precision format.)

Question 6

I was experimenting with this in Swift using doubles and floats. I wrote: 
var result2: Double = 0.1 + 0.1 + 0.1
var result3: Float = 0.1 + 0.1 + 0.1

The double returned 0.30000000000000004 and the float returned 0.3. In Swift Double represents a 64-bit floating point number and Float represents a 32-bit floating point number. So it did that because a float is... i guess... Less specific?

Accepted Answer

Funnily enough, yes.

A number like 0.1 isn't easy store, because it's difficult to turn into a power of two. You see when you tell your computer to work with the number 0.1, the number has to be converted.
Conversion (as a human mind you!) would work something like (we multiply by two, cut off the 1 if it pops up and don't stop until we hit 0 or we're tired of the algorithm)
0.1 * 2 = 0.2 | 0
0.2 * 2 = 0.4 | 0
0.4 * 2 = 0.8 | 0
0.8 * 2 = 1.6 | 1
0.6 * 2 = 1.2 | 1
0.2 * 2 = 0.4 | 0
at this point we're starting a loop :(

so the decimal 0.1 converts to the binary 0.00011001100110011 .... (the string 0011 will just keep repeating making the result more precise but never actually reaching 0.1)
A double can store twice the amount of information of a floating point, so it can happen that this extra space is used to "fix" rounding issues, by being more precise.

Question 7

why does the computer perceive 0.1 as infinitely repeating?

Accepted Answer

The binary number system cannot accurately represent 0.1 with finite digits. This is the same way that base-10 uses an infinite decimal expansion (0.33333...) for 1/3.

Question 8

Why do modern computers only use 64 bits? Can't they use as many bits as they want? Since more bits give more precise calculations.

Accepted Answer

Hello!
This subject gets pretty deep, but it boils down to bit and computer architecture. 64 bits is 8^2 so 8 bytes, and as we learned a few lessons ago computers these days only work in chunks of bytes. Then most computers don't need more then 64 bits because it would be overkill unless we were in a professional setting where calculations needed to be exact.

Thanks for your question!

Question 9

then how are modern computers able to give accurate computations when some numbers require infinite bits? Also how was the computer able to display 2^1023 (8.98846567431158e+307) when it wasn't able to display  9007199254740993?

Accepted Answer

I cound not answer ur problem and look around for a bit and found this:

Modern computers use finite-precision arithmetic to perform computations, which means that they can only represent numbers with a limited number of bits. This means that for numbers that require an infinite number of bits to represent exactly, such as irrational numbers like pi, the computer can only represent an approximation of the number. However, in practice, these approximations are usually accurate enough for most applications.

Regarding your second question, the reason why a computer can display 2^1023 (8.98846567431158e+307) but not 9007199254740993 is due to the way that numbers are represented in the computer's memory. In most modern computers, numbers are represented using a fixed number of bits, typically 32 or 64 bits. This means that the largest number that can be represented using 64 bits is 2^64 - 1, which is approximately 1.8 x 10^19.

In the case of 9007199254740993, which is larger than the largest number that can be represented using 64 bits, the computer cannot represent it exactly and therefore it is rounded to the nearest representable number. On the other hand, 2^1023 is well within the representable range for a 64-bit number and can therefore be displayed accurately.

Question 10

are there some other programming scenarios when 2^1024 = infinity?

Accepted Answer

In JavaScript, Math.pow(2, 1024) is Infinity due to the limitations of how numbers are stored in JavaScript. However, those limitations vary by language and environment. I just ran the same calculation in a Java environment and got a result of Infinity, but when I ran it in my local Python environment, I got a numeric result.

0	0	0	1
+/-	$4$ ‍	$2$ ‍	$1$ ‍
sign	$2^{2}$ ‍	$2^{1}$ ‍	$2^{0}$ ‍

0	1	1	1
+/-	$4$ ‍	$2$ ‍	$1$ ‍
sign	$2^{2}$ ‍	$2^{1}$ ‍	$2^{0}$ ‍

Course: Computers and the Internet > Unit 1

Number limits, overflow, and roundoff

Integer representation

Overflow

Floating-point representation

Roundoff errors

Want to join the conversation?