If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

# Measuring information

## Video transcript

consider the following Alice and Bob have figured out how to transmit messages between their tree houses at first they use flames at night and shudders during the day then they used a wire which they plucked in different ways eventually they electrified this wire to send electrical pulses and were now at work on an experimental wireless method the problem is in order to pay for their equipment they needed money so they decided to offer their service for a fee to others and on the first day Alice had three new customers who wanted to transmit messages to their friends over at Bob's treehouse the first customer wanted to send a list of 10 coin flips the second customer wanted to send a six letter word and the third customer wanted to send a poker hand the question now is how much did she charge well the price of a message should depend on how long it takes Alice to transmit it but how could she measure the length of different types of messages using a common unit to find out let's play a game imagine you were Bob now and you know Alice wants to send you these messages but all you can do is get the answer to yes-or-no questions you've arranged Alice will answer by sending a sequence of zeros or ones using some method of variation recall that all their methods of transmission involve the exchange of differences so a 1 could be represented by an open flame or an open shutter or an electrical pulse no matter how they are manifested we can simply call them binary digits because a binary digit can have only one of two values 0 or 1 so let's say 0 represents an O and 1 represents a yes your challenge now is to always ask the minimum number of questions in order to determine the exact message first let's consider the coin flips for each symbol the sender Alice can be thought of as selecting one of two different symbols heads or tails now how many questions do you need to ask to determine which she selected one question such as is it heads will suffice for 10 flips what is a minimum number of questions both 10 flips times one question per flip equals ten questions or ten binary digits to transmit this message next let's consider the letters for each symbol the sender Alice can be thought of as selecting one of 26 different symbols let's start with the simplest message which is one letter how many questions are needed is it a is it B is it C is it D and so on but that is not the minimum number of questions the best you could do is ask questions which eliminate half of the possibilities for example the middle of the alphabet is between m and n so we could first ask is it less than n if we receive a 1 yes we cut out half of the possibilities which leaves 13 left and since we can't split a letter in half we divide the possible symbols in two sets of 6 and 7 and ask is it less than G we receive a 1 which is yes and now we are left with 6 possible letters and we can split them in half and ask is it less than D we receive a 0 which is no leaving us with 3 possible letters and now we can pick a side and ask is it d we receive a 0 which is no and finally we are left with two possibilities we ask is it e we receive a no and after 5 questions we have correctly identified the symbol F realize that we will never need to ask more than five questions so the number of questions will be at least four and at most five and in general to ^ number of questions equals the number of possible messages which we previously defined as the message space so how can we calculate the exact average or expected number of questions given a message space of 26 we asked the reverse question two to the power of something equals 26 and to answer these types of questions we naturally use a logarithmic function base two because log base two of 26 is the exponent two needs to be raised to to give us 26 which is approximately four point seven so on average approximately four point seven questions will be needed per letter at minimum and since she wants to transmit a word with six letters Bob can expect to ask at minimum twenty eight point two questions which means Alice will need to send at most twenty-nine binary digits finally let's apply this formula to a new message the poker hand well for each symbol the sender Alice can be thought of as selecting one of 52 different symbols and in this case the number of questions is the same as the number of times we need to split the deck and ask Alice which pilot is in until we are left with one card which we will find is usually six splits or questions and sometimes five but we can save time and just use our equation log base two of 52 is approximately five point seven since two to the power of five point seven is approximately 52 so the minimum number of questions on average is five point seven per card a poker hand contains five cards to transmit a poker hand requires 28.5 questions on average we are done we now have our unit it's based on the minimum number of questions to define the message or the height of the decision tree and since Alice transmits this information as binary digits we can shorten this and call our unit the bit instead of binary digit so we have ten coin flips requires ten bits the six letter word requires twenty eight point two bits and the poker hand requires twenty eight point five bits Alice then decides to charge one penny per bit and begins collecting her fees now this idea emerged in the 1920s it was one of the more abstract problems that communication engineers were thinking about Ralph Hartley was a prolific electronics researcher who built on the ideas of Harry Nyquist both of whom worked at Bell Labs after World War 1 and in 1928 Hartley published an important paper titled the transmission of information and in it he defines the word information using the symbol H as H equals 10 times the logarithm of s where each is our information n is the number of symbols whether they're notes letters numbers etc and s is the number of different symbols available at each selection and this can also be written as H equals the logarithm of s to the power of n and Hartley writes what we have done then is to take as our practical measure of information the logarithm of the number of possible symbols sequences so information is the logarithm of the message space however realize that throughout this lesson we have been assuming that the symbol selection is random convenient simplification however we know that in reality most communication such as speech isn't always random it's a subtle mix of predictability and surprise you do not roll dice when we write letters and it is this predictability which can result in significant savings in the length of transmission because when we can predict things in advance we shouldn't need to ask as many yes-or-no questions to define it but how could we formally model this subtle difference this question brings us to the key insight in our story can you think of what it might be