If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Number limits, overflow, and roundoff

AP.CSP:
DAT‑1 (EU)
,
DAT‑1.B (LO)
,
DAT‑1.B.1 (EK)
,
DAT‑1.B.2 (EK)
,
DAT‑1.B.3 (EK)
When computer programs store numbers in variables, the computer needs to find a way to represent that number in computer memory. Computers use different strategies based on whether a number is an integer or not. Due to limitations in computer memory, programs sometimes encounter issues with roundoff, overflow, or precision of numeric variables.

Integer representation

An integer is any number that can be written without a fractional component. The same term is used in both programming and in math, so hopefully it's familiar to you.
All of these numbers are integers: 120, 10, 0, minus, 20.
How can a programming language represent those integers in computer memory? Well, computers represent all data with bits, so we know that ultimately, each of those numbers is a sequence of 0s and 1s.
To start simple, let's imagine a computer that uses only 4 bits to represent integers. It can use the first bit to represent the sign of the integer, positive or negative, and the other 3 bits for the absolute value.
In that system, the number 1 would be represented like this:
0001
+/-421
sign2, squared2, start superscript, 1, end superscript2, start superscript, 0, end superscript
The 0 in the sign bit represents a positive number, and the 1 in the right most bit represents the 2, start superscript, 0, end superscript (1) place of the value.
What's the largest number this system could represent? Let's fill all the value bits with 1 and see:
0111
+/-421
sign2, squared2, start superscript, 1, end superscript2, start superscript, 0, end superscript
That's the positive number 7, since 2, squared, plus, 2, start superscript, 1, end superscript, plus, 2, start superscript, 0, end superscript, equals, left parenthesis, 4, plus, 2, plus, 1, right parenthesis, equals, 7.
Check your understanding
Consider a computer that uses 6 bits to represent integers: 1 bit for the sign and 5 bits for the actual number. What's the largest positive integer it can represent?
  • Your answer should be
  • an integer, like 6
  • a simplified proper fraction, like 3, slash, 5
  • a simplified improper fraction, like 7, slash, 4
  • a mixed number, like 1, space, 3, slash, 4
  • an exact decimal, like 0, point, 75
  • a multiple of pi, like 12, space, start text, p, i, end text or 2, slash, 3, space, start text, p, i, end text

Overflow

What would happen if we ran a program like this on the 4-bit computer, where the largest positive integer is 7?
var x = 7;
var y = x + 1;
The computer can store the variable x just fine, but y is one greater than the largest integer it can represent with the 4 bits. In a case like this, the computer might report an "overflow error" or display a message like "number is too large". It might also truncate the number (capping all results to 7) or wrap the number around (so that 8 becomes 1).
We don't want to end up in any of those situations, so it's important we know the limitations of our language and environment when writing programs.
Check your understanding
On a computer which uses 6 bits to represent integers (with 1 bit to represent the sign), which of these operations result in overflow?
Choose all answers that apply:
Choose all answers that apply:

Fortunately, most modern computers use 64-bit architectures which can store incredibly large integers. In JavaScript, the largest safe integer is 9,007,199,254,740,992, equivalent to 2, start superscript, 53, end superscript, minus, 1. Beyond that, and we're in the danger zone.
✏️ Play around in the danger zone below! JavaScript doesn't display overflow errors, but it does do some other strange things.
📝 See similar code in: App Lab | Snap | Python

Floating-point representation

We've seen there are limitations to storing integers in a computer. Numbers that aren't integers, like fractions and irrational numbers, are even trickier to represent in computer memory.
Consider numbers like 2, slash, 5, 1, point, 234, 9, point, 999999, or the famously never-ending pi.
Computer languages typically use floating-point representation for non-integers (and sometimes integers, too). It's similar to "scientific notation", a representation you might know from other studies.
In floating-point representation, a number is multiplied by a base that's raised to an exponent:
300, equals, 3, times, start underbrace, 10, end underbrace, start subscript, start text, b, a, s, e, end text, end subscript, start superscript, start overbrace, 2, end overbrace, start superscript, start text, e, x, p, o, n, e, n, t, end text, end superscript, end superscript
Since computers use the binary system instead of the decimal system, the base for floating-point numbers is 2 instead of 10. Because of that, numbers that are exactly powers of 2 are the simplest to represent:
128=1×27256=1×28 128 = 1 \times 2^7\\ 256 = 1 \times 2^8
Numbers between powers of 2 look like this:
160=1.25×27192=1.50×27224=1.75×27 160 = 1.25 \times 2^7\\ 192 = 1.50 \times 2^7\\ 224 = 1.75 \times 2^7
What about non-integers? Once again, powers of 2 are the simplest to represent.
0.50=1×210.25=1×22 0.50 = 1 \times 2^{-1}\\ 0.25 = 1 \times 2^{-2}
Floating-point can also represent fractions between powers of 2:
0.750=1.5×210.375=1.5×22 0.750 = 1.5 \times 2^{-1}\\ 0.375 = 1.5 \times 2^{-2}
Once the computer determines the floating point representation for a number, it stores that in bits. Modern computers use a 64-bit system that uses 1 bit for the sign, 11 bits for the exponent, and 52 bits for the number in front.
Here's 0, point, 375 in that binary floating-point representation:
0011111111011000000000000000000000000000000000000000000000000000
The exact translation to bits is more complicated than we can go into here, but is a great topic for those of you who want to dive deeper.

Roundoff errors

Floating-point representation still can't fully represent all numbers, however. Consider the fraction 1, slash, 3 and its floating point representation:
1, slash, 3, equals, 1, point, start overline, 3, end overline, times, 2, start superscript, minus, 2, end superscript
In binary, point, start overline, 3, end overline is still an infinitely repeating sequence:
0101010101…
We can't store an infinite sequence in a computer! At some point, the computer has to end the number somehow, either by chopping it off or rounding to the nearest floating point number. Computers have to do that fairly often, as even fractions like 1, slash, 10 (which is a short 0, point, 1 in decimal) end up as infinitely repeating sequences once converted to binary.
We often don't notice the lower precision of a number's representation until we use it in calculations. That's when we can experience a roundoff error in the results.
✏️ Here's a program that attempts to add 0, point, 1, plus, 0, point, 1, plus, 0, point, 1. In the non-computer world, we know that's 0, point, 3. But in the computer, each of the 0, point, 1 values is stored as a rounded-off binary fraction, and when they're added together, they don't quite equal what we expect…
📝 See similar code in: App Lab | Snap | Python
The more bits we can use, the more precise our numbers and calculations will be. Modern 64-bit systems offer a high enough precision for low-stakes calculations.
Perhaps at some point in your life, you'll find yourself writing programs that calculate voting outcomes, power a self-driving car, or even launch a rocket. When the stakes are high, precision matters.

🙋🏽🙋🏻‍♀️🙋🏿‍♂️Do you have any questions about this topic? We'd love to answer— just ask in the questions area below!

Want to join the conversation?

  • mr pink red style avatar for user Lucas Hagemans
    Why is a floating point called that way? What is floating about it?
    (9 votes)
    Default Khan Academy avatar avatar for user
    • hopper jumping style avatar for user pamela ❤
      From the author:Great question. The floating part is the decimal (between the whole part and the fractional part), as floating point representation can both represent very large numbers with a lot of digits before the decimal (like 1292929.1) and very small numbers with a lot of digits after the decimal (like 1.29292929). Floating point representation can use its 52 bits to represent both the digits in the whole part and the digits in the fractional part. A contrasting type of representation is "fixed point", which always uses a certain number of bits to represent the whole part and a certain number of bits to represent the fractional part.
      Here's a nice longer explanation: https://stackoverflow.com/questions/7524838/fixed-point-vs-floating-point-number
      (20 votes)
  • winston default style avatar for user Sasasarthasus
    I was experimenting with this in Swift using doubles and floats. I wrote:
    var result2: Double = 0.1 + 0.1 + 0.1
    var result3: Float = 0.1 + 0.1 + 0.1

    The double returned 0.30000000000000004 and the float returned 0.3. In Swift Double represents a 64-bit floating point number and Float represents a 32-bit floating point number. So it did that because a float is... i guess... Less specific?
    (5 votes)
    Default Khan Academy avatar avatar for user
    • aqualine ultimate style avatar for user Martin
      Funnily enough, yes.

      A number like 0.1 isn't easy store, because it's difficult to turn into a power of two. You see when you tell your computer to work with the number 0.1, the number has to be converted.
      Conversion (as a human mind you!) would work something like (we multiply by two, cut off the 1 if it pops up and don't stop until we hit 0 or we're tired of the algorithm)
      0.1 * 2 = 0.2 | 0
      0.2 * 2 = 0.4 | 0
      0.4 * 2 = 0.8 | 0
      0.8 * 2 = 1.6 | 1
      0.6 * 2 = 1.2 | 1
      0.2 * 2 = 0.4 | 0
      at this point we're starting a loop :(

      so the decimal 0.1 converts to the binary 0.00011001100110011 .... (the string 0011 will just keep repeating making the result more precise but never actually reaching 0.1)
      A double can store twice the amount of information of a floating point, so it can happen that this extra space is used to "fix" rounding issues, by being more precise.
      (11 votes)
  • piceratops tree style avatar for user mkmykmyk
    I realized there is a pattern in binary numbers
    00000
    00001
    00010
    00011
    00100
    00101
    00110
    00111
    01000
    .....
    the last row goes 010101010101010......
    the second last row goes 00110011001100110011........
    the third last row goes 000011110000111100001111.......
    the fourth last row goes 000000001111111100000000111111110000000011111111.....
    the fifth last row goes 00000000000000001111111111111111000000000000000011111111111111110000000000000000111111111111111100000000000000001111111111111111......
    and so on
    (6 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Abhishek Shah
      Nice observation!

      If you look at the period between changing phases (1 -> 0 or 0 -> 1), you'll see you get exactly 2^i which corresponds to the ith row's contribution to the binary representation.

      For instance, the third row changes every 2^2 = 4 cycles. Hence, the 3rd position (starting from 0) contributes 4 to the binary representation.
      (6 votes)
  • aqualine tree style avatar for user Nuzaima
    Why do modern computers only use 64 bits? Can't they use as many bits as they want? Since more bits give more precise calculations.
    (2 votes)
    Default Khan Academy avatar avatar for user
    • duskpin ultimate style avatar for user Mr. Potter
      Hello!
      This subject gets pretty deep, but it boils down to bit and computer architecture. 64 bits is 8^2 so 8 bytes, and as we learned a few lessons ago computers these days only work in chunks of bytes. Then most computers don't need more then 64 bits because it would be overkill unless we were in a professional setting where calculations needed to be exact.

      Thanks for your question!
      (10 votes)
  • aqualine sapling style avatar for user jenanabdrabbi
    I didn't understand how floating point representation number like 0.375 in wrtten in binary.It's getting very confusing for me!
    Somebody help me out..
    (4 votes)
    Default Khan Academy avatar avatar for user
    • starky ultimate style avatar for user KLaudano
      There are 3 parts of a floating-point representation (using the IEEE-754 standard). The first bit is used to determine the sign of the number, 0 is positive and 1 is negative. The next section is the exponent of the number represented in scientific form with an added bias. The final section of the representation is the mantissa of the number in scientific form after dropping the leading 1.

      Suppose we want to convert 0.375 to its floating-point representation.

      0.375 = 0.011 = 1.1 * 2^(-2)

      sign bit = 0 since 0.375 is positive

      exponent = bias + original exponent = 1023 + (-2) = 1021 = 01111111101

      mantissa = number after the leading 1 = 1000000000000000000000000000000000000000000000000000

      floating-point representation = sign bit, exponent, mantissa = 0011111111011000000000000000000000000000000000000000000000000000

      (In this example, the floating-point representation is in double precision format.)
      (3 votes)
  • male robot hal style avatar for user 24rkumar
    shouldn't max value be 32 because 0 is redundant? If there are 5 bits to store a value plus one more to store a sign, that would mean there would be both a +0 and a -0. If we shifted the positive binary values so that zero isnt redundant (0 00001 = -1, 0 00000 = 0, 1 00000 = 1, 1 00001 = 2, etc.), then wouldnt the possible range of integers be from -31 to 32?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Abhishek Shah
      Nice observation as this is what has been done in practice! However, one small modification is that a 6-bit number often ranges from -32 to 31. This is because negative numbers have a 1 as the sign bit instead of 0 in your example. One reason to use this convention is to keep consistency.
      Adding 1 should increasing the value of a number.

      In your scheme, if I do
      (-1) + 1 --> 0 00001 + 1
      I should somehow get 0 00000 out of it.

      A more natural way is to use a 1 as the sign bit so (-1) becomes
      1 11111 + 1 --> 0 00000

      Hope this helps!
      (5 votes)
  • piceratops tree style avatar for user Anderson, Emalee
    Why Doesn't an 8-bit calculator experience round off error to the same degree as a 64-bit computer might? If I'm understanding, any number higher than 127 should cause it to overflow. What am I missing?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • aqualine ultimate style avatar for user Martin
      Your computer being a 64-bit machine doesn't really have anything to do with the number format, rather it describes its ability to store computational values. So a 64-bit system is more powerful than a 32-bit or 8-bit system because it can work with more values at once.

      Numbers are stored at a particular location and can take up space independent from your computational power, the same way a computer game can take up 60gb even though you're running a 64bit system.

      Roundoff errors depend on how the system implements numbers, so if a system gives a floating-point number 32 bit it'll be less precise if a system stores the number in a 64bit space.

      You can learn more about floating-point numbers by searching for IEEE 754 on Wikipedia.
      (3 votes)
  • female robot ada style avatar for user Mr. K
    "9,007,199,254,740,992 is equivalent to 2^52 - 1"

    Shouldn't it be odd then? Actually, JS pow function says that 2^53 = 9,007,199,254,740,992.
    (4 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Neev Badu
    I didn't understand the bar over the 3 what was that for ?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Samson pinter
    in the examples above on floating point presentation, why do we always multiply by 1? why wont it suffice to just figure out the exponent that the 2 is being represented by. so if were trying to figure out 128 then just do 2x7

    also in the .50 and .25 example what does the negative exponent represent. why does -1 represent .50? etc....
    (2 votes)
    Default Khan Academy avatar avatar for user