If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Measuring information

How can we quantify/measure an information source? Created by Brit Cruise.

Want to join the conversation?

  • male robot hal style avatar for user Koen Bender
    I think that for sending the 'poker hand' information there must be a better algorithm because unlike the word, the order does not matter. Is there some algorithm that can do this?
    (31 votes)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user Cameron
      So here's how you could use less space.....
      There are 52 C 5 possible hands i.e. 52!/(5!*47!)=2598960
      So for all the possible hands sort them in order first by suit (alphabetically clubs,diamonds,hearts, spades), then by rank (Ace to King). Assign the 1st of the sorted hands 1, then 2nd 2.... etc. until you have numbered all the hands.
      i.e. 1st hand would be Ac,2c,3c,4c,5c, 2nd hand would be Ac,2c,3c,4c,6c,.... last hand (2598960th hand) would be 9s,10s,Js,Qs,Ks
      So we now have a table of hands numbered from 1 to 2598960.

      If both sides have this table we can just send the number in the table and both sides will know which hand is being referred to.

      Note:
      log_2(2598960) approximately equals 21.3095

      Compare this to sending log_2(52)=5.7004 bits per card for 5 cards
      for a total of 28.50219 bits

      We saved 28.5022-21.3095= 7.1927 bits

      Here's a bit more details as to where the savings came from:
      -We save 0.29 bits by not having duplicate cards
      52^5/(52*51*49*48*47)=1.21909
      log_2(1.21909)=0.28580
      -We save 6.91 bits by not caring about the order
      5! permutations= 120
      log_2(120)=6.90689

      Hope this makes sense
      (61 votes)
  • piceratops ultimate style avatar for user worldwithoutmin
    About the poker hand, what about sending message of card's number and suit separately? For example first question about color, red or black. Then, about the suit. Then about the number? Would it be the same as randomly dividing the shuffled deck?
    (16 votes)
    Default Khan Academy avatar avatar for user
    • mr pink red style avatar for user Varun
      Umm lets count
      Color: R/B - one bit
      Suit: this is another one bit because two suits were ruled out by the earlier step
      Number: Is it higher than 7? one bit
      Is it higher than 4/11 one more bit
      Higher than 2/6/10/12
      And one more bit to make the final guess
      That adds up to 6 bits, the same as Brit said I think.
      (25 votes)
  • duskpin tree style avatar for user Chiron
    So it's the board game "Guess Who?"
    (7 votes)
    Default Khan Academy avatar avatar for user
  • male robot hal style avatar for user Odaxcir
    There's been a lot of talk about predictability and redundancy in this Q&A, and as I was thinking about it, a thought hit me: Are predictability and redundancy really all that different; that is, are they really two related phenomena, and that something is predictable because of redundancy.

    Of course, now that I write this, I begin to have my doubts. After all, its one thing to say the same thing twice, its another to give the necessary clues for math, logic, and science to deduce with.

    A counter argument, I feel, can be found in the example of the Periodic Table of the Elements, which is a high-redundancy communication (according to The Way Things Work; The Book of the Computer), which nevertheless seem to me as being filled with a lot more "clues" and less repetitions.

    I guess what I am trying to say is, "Does redundancy encompass clues, or repetition only?"

    What do you think.
    (8 votes)
    Default Khan Academy avatar avatar for user
    • leaf red style avatar for user Noble Mushtak
      After reading "A Mathematical Theory of Communication" a bit in Research, I know that repetition leads to probability. To find the probability of letters, we must first find repetitions. Redundancy, however, means something extra, something not needed. Although repetition and redundancy have the same literal meaning, they have different connotations. Since redundancy or too much of something is not usually found in English text (over-describing something or repeating something until it becomes unnecessary to do so any more is looked down upon in English texts), I would say that only repetition leads to probability, not redundancy.

      Does this answer your question? I know it's completely different from my previous one.
      (12 votes)
  • purple pi purple style avatar for user Hannah Elaine
    I have a question to do with the second example, the one with the letters. If you are sending the letters in a series of ones and zeros, with absolutely no other symbols or breaks (that would signify something), don't you have to have more than 28.2.
    Suppose you give the first sixteen letters of the alphabet values with 1s and .0s.
    Such as a = 0000
    b= 0001
    c=0010
    d=0011
    and so on. When you hit p you will run out of possibilities and have to use five symbol ones. Any five symbol signal you devise will be one of the four symbol ones with a 1 or 0 tacked on the end (or the beginning, depending how you think about it).
    If Alice sends six of these signals in a row, all the letters will run into each-other. If she sends a five-bit letter followed by a four-bit letter it will be impossible to tell from a four-bit letter followed by a five-bit one. If she pauses between her signals, that would be like another symbol (she could then choose from 1 0 or off).
    Supposing Bob sent messages in between hers, asking questions, we would be back down to 4.7 per word, but Bob would be wasting a lot of time encoding each of his questions letter by letter. It would still be more efficient for Alice to encode each letter into a five-bit symbol and Bob recognize that each five-bits represented one word.
    Also, won't Alice have to send additional information at the beginning of each message clarifying if she's sending words or numbers or cards or whatever else?
    (6 votes)
    Default Khan Academy avatar avatar for user
  • leaf green style avatar for user elizkuhlman
    At It says that information isn't always random, so couldn't one arrange the most common or predictable information so that it takes fewer bits to transmit? Like at where the number of questions can be at least four but at most five, the most common letters could be arranged to take only four and the least common to take five bits so the average power of two could be reduced, leading to a more streamlined information transmission system.
    (5 votes)
    Default Khan Academy avatar avatar for user
  • mr pink red style avatar for user Varun
    Can someone explain what a logarithm is to me? I'm only in Algebra, and I haven't learned that yet. What kind of a function is it?
    (0 votes)
    Default Khan Academy avatar avatar for user
  • winston default style avatar for user Jared Desai
    Did anyone else notice that at the end, the alphabet has x and y switched?
    (5 votes)
    Default Khan Academy avatar avatar for user
  • primosaur ultimate style avatar for user ElectrifyPro
    What are the 5 cards in a poker hand?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • female robot grace style avatar for user Rey #FilmmakerForLife #EstelioVeleth.
    so could you actually put in a one for a dit, and a 0 for a da...and so on? like instead of - - - --- 1110?
    (3 votes)
    Default Khan Academy avatar avatar for user

Video transcript

Voiceover: Consider the following, Alice and Bob have figured out how to transmit messages between their treehouses. At first, they used flames at night and shutters during the day. Then they used a wire, which they plucked in different ways. Eventually, they electrified this wire to send electrical pulses and were now at work on an experimental wireless method. The problem is, in order to pay for their equipment, they needed money. So, they decided to offer their service for a fee to others. And on the first day, Alice had three new customers who wanted to transmit messages to their friends over at Bob's treehouse. The first customer wanted to send a list of 10 coin flips, the second customer wanted to send a six-letter word, and the third customer wanted to send a poker hand. The question now is, how much should she charge? Well, the price of a message should depend on how long it takes Alice to transmit it. But how could she measure the length of different types of messages using a common unit? To find out, let's play a game. Imagine you were Bob now, and you know Alice wants to send you these messages, but all you can do is get the answer to yes or no questions you've arranged. Alice will answer by sending a sequence of zeros or ones, using some method of variation. Recall that all their methods of transmission involved the exchange of differences. So, a one could be represented by an open flame or an open shutter or an electrical pulse. No matter how they are manifested, we can simply call them binary digits, because a binary digit can have only one of two values, zero or one. So, let's say zero represents a no and one represents a yes. Your challenge now is to always ask the minimum number of questions in order to determine the exact message. First, let's consider the coin flips. For each symbol, the sender, Alice, can be thought of as selecting one of two different symbols, heads or tails. Now, how many questions do you need to ask to determine which she selected? One question such as, is it heads, will suffice. For 10 flips, what is the minimum number of questions? Well, 10 flips times one question per flip equals 10 questions or 10 binary digits to transmit this message. Next, let's consider the letters. For each symbol, the sender, Alice, can be thought of as selecting one of 26 different symbols. Let's start with the simplest message, which is one letter. How many questions are needed? Is it A? Is it B? Is it C? Is it D? And so on, but that is not the minimum number of questions. The best you could do is ask questions which eliminate half of the possibilities. For example, the middle of the alphabet is between M and N. So, we could first ask, is it less than N? If we receive a one, yes, we cut out half of the possibilities, which leaves 13 left, and since we can't split a letter in half, we divide the possible symbols into sets of six and seven and ask, is it less than G? We receive a one, which is yes, and now we are left with six possible letters, and we can split them in half and ask, is it less than D? We receive a zero, which is no, leaving us with three possible letters, and now we can pick a side and ask, is it D? We receive a zero, which is no, and finally we are left with two possibilities. We ask, is it E? We receive a no, and after five questions, we have correctly identified the symbol, F. Realize that we will never need to ask more than five questions, so the number of questions will be at least four and at most five, and in general, two to the power of number of questions equals the number of possible messages, which we previously defined as the message space. So, how can we calculate the exact average or expected number of questions, given a message space of 26? We ask the reverse question. Two to the power of something equals 26, and to answer these types of questions, we naturally use a logarithmic function, base two, because log base two of 26 is the exponent two needs to be raised to, to give us 26, which is approximately 4.7. So, on average, approximately 4.7 questions will be needed per letter at minimum, and since she wants to transmit a word with six letters, Bob can expect to ask, at minimum, 28.2 questions, which means Alice will need to send, at most, 29 binary digits. Finally, let's apply this formula to a new message, the poker hand. Well, for each symbol, the sender, Alice, can be thought of as selecting one of 52 different symbols, and in this case, the number of questions is the same as the number of times we need to split the deck and ask Alice which pile it is in, until we are left with one card, which we will find is usually six splits or questions and sometimes five. But we can save time and just use our equation. Log base two of 52 is approximately 5.7, since two to the power of 5.7 is approximately 52. So, the minimum number of questions on average is 5.7 per card. A poker hand contains five cards. So, to transmit a poker hand requires 28.5 questions on average. We are done. We now have our unit. It's based on the minimum number of questions to define the message or the height of the decision tree, and since Alice transmits this information as binary digits, we can shorten this and call our unit the bit, instead of binary digit. So, we have 10 coin flips requires 10 bits, the six-letter word requires 28.2 bits, and the poker hand requires 28.5 bits. Alice then decides to charge one penny per bit and begins collecting her fees. Now, this idea emerged in the 1920's. It was one of the more abstract problems that communication engineers were thinking about. Ralph Hartley was a prolific electronics researcher who built on the ideas of Harry Nyquist, both of whom worked at Bell Labs after World War I, and in 1928, Hartley published an important paper titled, The Transmission of Information, and in it he defines the word information using the symbol H, as H equals N times the logarithm of S, where H is our information, N is the number of symbols, whether they're notes, letters, numbers, etc., and S is the number of different symbols available at each selection, and this can also be written as H equals the logarithm of S to the power of N, and Hartley writes, "What we have done then is to take as our practical measure of information the logarithm of the number of possible symbol sequences." So, information is the logarithm of the message space; however, realize that throughout this lesson we have been assuming that the symbol selection is random, convenient simplification; however, we know that in reality most communication, such as speech, isn't always random. It's a subtle mix of predictability and surprise. We do not roll dice when we write letters, and it is this predictability which can result in significant savings in the length of transmission, because when we can predict things in advance, we shouldn't need to ask as many yes or no questions to define it, but how could we formally model this subtle difference? This question brings us to the key insight in our story. Can you think of what it might be?