Main content

## AP®︎/College Computer Science Principles

### Unit 1: Lesson 3

Limitations of storing numbers# Number limits, overflow, and roundoff

AP.CSP:

DAT‑1 (EU)

, DAT‑1.B (LO)

, DAT‑1.B.1 (EK)

, DAT‑1.B.2 (EK)

, DAT‑1.B.3 (EK)

When computer programs store numbers in variables, the computer needs to find a way to represent that number in computer memory. Computers use different strategies based on whether a number is an integer or not. Due to limitations in computer memory, programs sometimes encounter issues with

**roundoff**,**overflow**, or**precision**of numeric variables.### Integer representation

An

**integer**is any number that can be written without a fractional component. The same term is used in both programming and in math, so hopefully it's familiar to you.All of these numbers are integers: 120, 10, 0, minus, 20.

How can a programming language represent those integers in computer memory? Well, computers represent

*all data*with bits, so we know that ultimately, each of those numbers is a sequence of 0s and 1s.To start simple, let's imagine a computer that uses only 4 bits to represent integers. It can use the first bit to represent the sign of the integer, positive or negative, and the other 3 bits for the absolute value.

In that system, the number 1 would be represented like this:

0 | 0 | 0 | 1 |
---|---|---|---|

+/- | 4 | 2 | 1 |

sign | 2, squared | 2, start superscript, 1, end superscript | 2, start superscript, 0, end superscript |

The 0 in the sign bit represents a positive number, and the 1 in the right most bit represents the 2, start superscript, 0, end superscript (1) place of the value.

What's the largest number this system could represent? Let's fill all the value bits with 1 and see:

0 | 1 | 1 | 1 |
---|---|---|---|

+/- | 4 | 2 | 1 |

sign | 2, squared | 2, start superscript, 1, end superscript | 2, start superscript, 0, end superscript |

That's the positive number 7, since 2, squared, plus, 2, start superscript, 1, end superscript, plus, 2, start superscript, 0, end superscript, equals, left parenthesis, 4, plus, 2, plus, 1, right parenthesis, equals, 7.

### Overflow

What would happen if we ran a program like this on the 4-bit computer, where the largest positive integer is 7?

```
var x = 7;
var y = x + 1;
```

The computer can store the variable

`x`

just fine, but `y`

is one greater than the largest integer it can represent with the 4 bits. In a case like this, the computer might report an "overflow error" or display a message like "number is too large". It might also truncate the number (capping all results to 7) or wrap the number around (so that 8 becomes 1). We don't want to end up in any of those situations, so it's important we know the limitations of our language and environment when writing programs.

Fortunately, most modern computers use 64-bit architectures which can store incredibly large integers. In JavaScript, the largest safe integer is 9,007,199,254,740,992, equivalent to 2, start superscript, 53, end superscript, minus, 1. Beyond that, and we're in the danger zone.

### Floating-point representation

We've seen there are limitations to storing integers in a computer. Numbers that aren't integers, like fractions and irrational numbers, are even trickier to represent in computer memory.

Consider numbers like 2, slash, 5, 1, point, 234, 9, point, 999999, or the famously never-ending pi.

Computer languages typically use

**floating-point representation**for non-integers (and sometimes integers, too). It's similar to "scientific notation", a representation you might know from other studies.In floating-point representation, a number is multiplied by a base that's raised to an exponent:

Since computers use the binary system instead of the decimal system, the base for floating-point numbers is 2 instead of 10. Because of that, numbers that are exactly powers of 2 are the simplest to represent:

Numbers between powers of 2 look like this:

What about non-integers? Once again, powers of 2 are the simplest to represent.

Floating-point can also represent fractions between powers of 2:

Once the computer determines the floating point representation for a number, it stores that in bits. Modern computers use a 64-bit system that uses 1 bit for the sign, 11 bits for the exponent, and 52 bits for the number in front.

Here's 0, point, 375 in that binary floating-point representation:

`0011111111011000000000000000000000000000000000000000000000000000`

The exact translation to bits is more complicated than we can go into here, but is a great topic for those of you who want to dive deeper.

### Roundoff errors

Floating-point representation still can't fully represent all numbers, however. Consider the fraction 1, slash, 3 and its floating point representation:

In binary, point, start overline, 3, end overline is still an infinitely repeating sequence:

`0101010101…`

We can't store an infinite sequence in a computer! At some point, the computer has to end the number somehow, either by chopping it off or rounding to the nearest floating point number. Computers have to do that fairly often, as even fractions like 1, slash, 10 (which is a short 0, point, 1 in decimal) end up as infinitely repeating sequences once converted to binary.

We often don't notice the lower precision of a number's representation until we use it in calculations. That's when we can experience a

**roundoff error**in the results.✏️ Here's a program that attempts to add 0, point, 1, plus, 0, point, 1, plus, 0, point, 1. In the non-computer world, we know that's 0, point, 3. But in the computer, each of the 0, point, 1 values is stored as a rounded-off binary fraction, and when they're added together, they don't quite equal what we expect…

The more bits we can use, the more precise our numbers and calculations will be. Modern 64-bit systems offer a high enough precision for low-stakes calculations.

Perhaps at some point in your life, you'll find yourself writing programs that calculate voting outcomes, power a self-driving car, or even launch a rocket. When the stakes are high, precision matters.

🙋🏽🙋🏻♀️🙋🏿♂️Do you have any questions about this topic? We'd love to answer— just ask in the questions area below!

## Want to join the conversation?

- Why is a floating point called that way? What is floating about it?(9 votes)
- From the author:Great question. The floating part is the decimal (between the whole part and the fractional part), as floating point representation can both represent very large numbers with a lot of digits before the decimal (like 1292929.1) and very small numbers with a lot of digits after the decimal (like 1.29292929). Floating point representation can use its 52 bits to represent both the digits in the whole part and the digits in the fractional part. A contrasting type of representation is "fixed point", which always uses a certain number of bits to represent the whole part and a certain number of bits to represent the fractional part.

Here's a nice longer explanation: https://stackoverflow.com/questions/7524838/fixed-point-vs-floating-point-number(20 votes)

- I was experimenting with this in Swift using doubles and floats. I wrote:

var result2: Double = 0.1 + 0.1 + 0.1

var result3: Float = 0.1 + 0.1 + 0.1

The double returned 0.30000000000000004 and the float returned 0.3. In Swift Double represents a 64-bit floating point number and Float represents a 32-bit floating point number. So it did that because a float is... i guess... Less specific?(5 votes)- Funnily enough, yes.

A number like 0.1 isn't easy store, because it's difficult to turn into a power of two. You see when you tell your computer to work with the number 0.1, the number has to be converted.

Conversion (as a human mind you!) would work something like (we multiply by two, cut off the 1 if it pops up and don't stop until we hit 0 or we're tired of the algorithm)

0.1 * 2 = 0.2 | 0

0.2 * 2 = 0.4 | 0

0.4 * 2 = 0.8 | 0

0.8 * 2 = 1.6 | 1

0.6 * 2 = 1.2 | 1

0.2 * 2 = 0.4 | 0

at this point we're starting a loop :(

so the decimal 0.1 converts to the binary 0.00011001100110011 .... (the string 0011 will just keep repeating making the result more precise but never actually reaching 0.1)

A double can store twice the amount of information of a floating point, so it can happen that this extra space is used to "fix" rounding issues, by being more precise.(11 votes)

- I realized there is a pattern in binary numbers

00000

00001

00010

00011

00100

00101

00110

00111

01000

.....

the last row goes 010101010101010......

the second last row goes 00110011001100110011........

the third last row goes 000011110000111100001111.......

the fourth last row goes 000000001111111100000000111111110000000011111111.....

the fifth last row goes 00000000000000001111111111111111000000000000000011111111111111110000000000000000111111111111111100000000000000001111111111111111......

and so on(6 votes)- Nice observation!

If you look at the period between changing phases (1 -> 0 or 0 -> 1), you'll see you get exactly 2^i which corresponds to the ith row's contribution to the binary representation.

For instance, the third row changes every 2^2 = 4 cycles. Hence, the 3rd position (starting from 0) contributes 4 to the binary representation.(6 votes)

- Why do modern computers only use 64 bits? Can't they use as many bits as they want? Since more bits give more precise calculations.(2 votes)
- Hello!

This subject gets pretty deep, but it boils down to bit and computer architecture. 64 bits is 8^2 so 8 bytes, and as we learned a few lessons ago computers these days only work in chunks of bytes. Then most computers don't need more then 64 bits because it would be overkill unless we were in a professional setting where calculations needed to be exact.

Thanks for your question!(10 votes)

- I didn't understand how floating point representation number like 0.375 in wrtten in binary.It's getting very confusing for me!

Somebody help me out..(4 votes)- There are 3 parts of a floating-point representation (using the IEEE-754 standard). The first bit is used to determine the sign of the number, 0 is positive and 1 is negative. The next section is the exponent of the number represented in scientific form with an added bias. The final section of the representation is the mantissa of the number in scientific form after dropping the leading 1.

Suppose we want to convert 0.375 to its floating-point representation.

0.375 = 0.011 = 1.1 * 2^(-2)

sign bit = 0 since 0.375 is positive

exponent = bias + original exponent = 1023 + (-2) = 1021 = 01111111101

mantissa = number after the leading 1 = 1000000000000000000000000000000000000000000000000000

floating-point representation = sign bit, exponent, mantissa = 0011111111011000000000000000000000000000000000000000000000000000

(In this example, the floating-point representation is in double precision format.)(3 votes)

- shouldn't max value be 32 because 0 is redundant? If there are 5 bits to store a value plus one more to store a sign, that would mean there would be both a +0 and a -0. If we shifted the positive binary values so that zero isnt redundant (0 00001 = -1, 0 00000 = 0, 1 00000 = 1, 1 00001 = 2, etc.), then wouldnt the possible range of integers be from -31 to 32?(2 votes)
- Nice observation as this is what has been done in practice! However, one small modification is that a 6-bit number often ranges from -32 to 31. This is because negative numbers have a 1 as the sign bit instead of 0 in your example. One reason to use this convention is to keep consistency.

Adding 1 should increasing the value of a number.

In your scheme, if I do

(-1) + 1 --> 0 00001 + 1

I should somehow get 0 00000 out of it.

A more natural way is to use a 1 as the sign bit so (-1) becomes

1 11111 + 1 --> 0 00000

Hope this helps!(5 votes)

- Why Doesn't an 8-bit calculator experience round off error to the same degree as a 64-bit computer might? If I'm understanding, any number higher than 127 should cause it to overflow. What am I missing?(3 votes)
- Your computer being a 64-bit machine doesn't really have anything to do with the number format, rather it describes its ability to store computational values. So a 64-bit system is more powerful than a 32-bit or 8-bit system because it can work with more values at once.

Numbers are stored at a particular location and can take up space independent from your computational power, the same way a computer game can take up 60gb even though you're running a 64bit system.

Roundoff errors depend on how the system implements numbers, so if a system gives a floating-point number 32 bit it'll be less precise if a system stores the number in a 64bit space.

You can learn more about floating-point numbers by searching for IEEE 754 on Wikipedia.(3 votes)

- "9,007,199,254,740,992 is equivalent to 2^52 - 1"

Shouldn't it be odd then? Actually, JS pow function says that 2^53 = 9,007,199,254,740,992.(4 votes)- You're right about both(0 votes)

- I didn't understand the bar over the 3 what was that for ?(2 votes)
- The bar is sometimes used in mathematics to show that the numbers below the bar repeat periodically.

So in this case it means the 3 keeps repeating for ever.(4 votes)

- in the examples above on floating point presentation, why do we always multiply by 1? why wont it suffice to just figure out the exponent that the 2 is being represented by. so if were trying to figure out 128 then just do 2x7

also in the .50 and .25 example what does the negative exponent represent. why does -1 represent .50? etc....(2 votes)- Although 1*2^7 can be simplified to 2^7, the computer still needs to store the 1 in order to represent 128.

In general, x^(-y) = 1/(x^y). So, 2^(-1) = 1/(2^1) = 1/2 = 0.5(4 votes)