Main content
Running time of binary search
We know that linear search on an array of elements might have to make as many as guesses. You probably already have an intuitive idea that binary search makes fewer guesses than linear search. You even might have perceived that the difference between the worst-case number of guesses for linear search and binary search becomes more striking as the array length increases. Let's see how to analyze the maximum number of guesses that binary search makes.
The key idea is that when binary search makes an incorrect guess, the portion of the array that contains reasonable guesses is reduced by at least half. If the reasonable portion had 32 elements, then an incorrect guess cuts it down to have at most 16. Binary search halves the size of the reasonable portion upon every incorrect guess.
If we start with an array of length 8, then incorrect guesses reduce the size of the reasonable portion to 4, then 2, and then 1. Once the reasonable portion contains just one element, no further guesses occur; the guess for the 1-element portion is either correct or incorrect, and we're done. So with an array of length 8, binary search needs at most four guesses.
What do you think would happen with an array of 16 elements? If you said that the first guess would eliminate at least 8 elements, so that at most 8 remain, you're getting the picture. So with 16 elements, we need at most five guesses.
By now, you're probably seeing the pattern. Every time we double the size of the array, we need at most one more guess. Suppose we need at most guesses for an array of length . Then, for an array of length , the first guess cuts the reasonable portion of the array down to size , and at most guesses finish up, giving us a total of at most guesses.
Let's look at the general case of an array of length . We can express the number of guesses, in the worst case, as "the number of times we can repeatedly halve, starting at , until we get the value 1, plus one." But that's inconvenient to write out.
Fortunately, there's a mathematical function that means the same thing as the number of times we repeatedly halve, starting at , until we get the value 1: the base-2 logarithm of . That's most often written as , but you may also see it written as in computer science writings. (Want to review logarithms? Learn more here.)
Here's a table showing the base-2 logarithms of various values of :
1 | 0 |
2 | 1 |
4 | 2 |
8 | 3 |
16 | 4 |
32 | 5 |
64 | 6 |
128 | 7 |
256 | 8 |
512 | 9 |
1024 | 10 |
1,048,576 | 20 |
2,097,152 | 21 |
We can view this same table as a graph:
Zooming in on smaller values of n:
The logarithm function grows very slowly. Logarithms are the inverse of exponentials, which grow very rapidly, so that if , then . For example, because , we know that .
That makes it easy to calculate the runtime of a binary search algorithm on an that's exactly a power of 2. If is 128, binary search will require at most 8 ( ) guesses.
What if isn't a power of 2? In that case, we can look at the closest lower power of 2. For an array whose length is 1000, the closest lower power of 2 is 512, which equals . We can thus estimate that is a number greater than 9 and less than 10, or use a calculator to see that its about 9.97. Adding one to that yields about 10.97. In the case of a decimal number, we round down to find the actual number of guesses. Therefore, for a 1000-element array, binary search would require at most 10 guesses.
For the Tycho-2 star catalog with 2,539,913 stars, the closest lower power of 2 is (which is 2,097,152), so we would need at most 22 guesses. Much better than linear search!
Compare vs below:
In the next tutorial, we'll see how computer scientists characterize the running times of linear search and binary search, using a notation that distills the most important part of the running time and discards the less important parts.
This content is a collaboration of Dartmouth Computer Science professors Thomas Cormen and Devin Balkcom, plus the Khan Academy computing curriculum team. The content is licensed CC-BY-NC-SA.
Want to join the conversation?
- I just don't get the +1 part. If lg(1000) = 9.96 so you add 1 I could understand but lg(1024) equals 10 right away, why + 1 to make it 11?(33 votes)
- Let me start out by saying that, I really wouldn't worry too much about the +1 too much. It's not important. What's important is that the number of guesses is on the order of log_2(n). If you still want to know where the +1 comes from, then read on.
For the implementation of the binary search specified:
max. # guesses = floor(log_2(n))+1
Which means that:
n=512 to 1023 require max. of 10 guesses
n=1024 to 2047 requires max. of 11 guesses
So, where does the +1 come from ?
Don't forget that, even when you have only 1 element, you still have to guess it, because it is possible that the target is not in the array. So we need to add on the +1 for the last guess when we are down to the final remaining element.
It may be clearer if you think about when n=1. We still need 1 guess, but log_2(1) = 0. So we need to add a +1 to fix up our formula.
(Alternatively, if you are familiar with binary trees https://en.wikipedia.org/wiki/Binary_tree you can make a "complete" binary tree where each level is a choice and the nodes are the guesses. The first level is level 0. The number of guesses required to reach an element is the level it is sitting on +1. )
If you really want to understand what is going on, then you could try to confirm the following by working through the algorithm by hand:n max. # of guesses
1 1
2 2
3 2
4 3
5 3
6 3
7 3
8 4
9 4
...
Example:
Suppose we have this array: [2, 3, 5, 7, 11, 13, 17, 19] (8 elements)
and we search for: 19
Here's the values of min, max and guess on each iteration:
min: 0 max: 7 guess: 3
min: 4 max: 7 guess: 5
min: 6 max: 7 guess: 6
min: 7 max: 7 guess: 7
# of guesses: 4
Hope this makes sense(147 votes)
- Why is it called binary search if there are three options?
1. Higher
2. Lower
3. Correct(14 votes)- "Binary" doesn't refer to the number of possible outcomes from the guess, but to the method of dividing the pool of possibilities into two parts and treating each of them as only one "thing."(55 votes)
- whats the difference between an algorithm and a logarithm(2 votes)
- An algorithm is a precise rule (or set of rules) specifying how to solve some problem, while a logarithm is the exponent required to produce a given number.(13 votes)
- I know this is not related to the article. Still...
In the previous challenge, how do I
the total guesses? (Step 3)println()
(0 votes)- You need to create a variable, increment the variable each time a guess is made, and finally print it out when the target is found.(17 votes)
- Doesn't it show powers of 2 in the table?(4 votes)
- Yes. log is the inverse operation of exponentiation.
So log_2( 2^x ) = x
e.g. log_2( 2^6 ) = log_2( 64) = 6(3 votes)
- I want to make sure +1 thing.
If n=3, the maximum guesses are 3?
since 2^2=4, 2+1 =3(as the article said)
But I still don't get it.
When I work through the algorithm by hand, it goes like this.
n max. # of guesses
1 1
2 2
3 2
4 3
5 3
6 3
7 3
8 4
9 4
...
Then when n=1000, it seems like the maximum guesses are 10.(3 votes)- The error in your calculations is that you did
log₂(4)+1
instead oflog₂(2)+1
. You need to take the logarithm of the closest lower power of 2 (2), not closest higher (4). You can see thatlog₂(2)+1 = 1+1 = 2
.(3 votes)
- At the end of the 3rd paragraph it states that...with an array of length 8, binary search needs at most four guesses. But how can this be true as if it starts with 8 the most guesses possible would be 3 and on the 3rd guess you would end up with the answer. I am not sure if this is what would happen cause I don't get it...(4 votes)
- In an array of length 8, how is it that we need four guesses? Please make me clear.(1 vote)
- Three to narrow down the position, plus one more to tell if your final guess is right or not.
Let's suppose that you're looking for an element that happens to be at position 3 (where the first element is position 1). Before you make any guesses, your element could be anywhere in the range 1-8. After the first guess, you can rule out half the elements; you now know that the element you're looking for is in elements 1-4. On your next guess, you can rule out another half, so you're down to elements 3-4. On your third guess, you can tell that it's either at element 3, or doesn't exist. But you need to make one more guess to make sure that element 3 is the one you're after; it might be different, after all!(6 votes)
- for the challenge; I have created the below code with Python language, appreciate your commends if you have any suggestions to make it better: ( it seems that the indented lines look like normal lines here, sorry for that )
primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
mn = 0
mx = len(primes)
target = 100
for i in range(10):
guess = int((mn+mx)/2)
if primes[guess] == target:
print ("target found in index number : "+ str(primes.index(primes[guess])))
break
elif mx == 0 or mn == len(primes)-1 or mx == mn + 1:
print ("number not found")
break
elif primes[guess] > target:
mx = guess
elif primes[guess] < target:
mn = guess(2 votes) - Does n10 means there are 6 guesses at max?(3 votes)