Main content

## Computer science

### Course: Computer science > Unit 3

Lesson 2: Modern information theory- Symbol rate
- Introduction to channel capacity
- Message space exploration
- Measuring information
- Origin of Markov chains
- Markov chain exploration
- A mathematical theory of communication
- Markov text exploration
- Information entropy
- Compression codes
- Error correction
- The search for extraterrestrial intelligence

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Measuring information

How can we quantify/measure an information source? Created by Brit Cruise.

## Want to join the conversation?

- I think that for sending the 'poker hand' information there must be a better algorithm because unlike the word, the order does not matter. Is there some algorithm that can do this?(31 votes)
- So here's how you could use less space.....

There are 52 C 5 possible hands i.e. 52!/(5!*47!)=2598960

So for all the possible hands sort them in order first by suit (alphabetically clubs,diamonds,hearts, spades), then by rank (Ace to King). Assign the 1st of the sorted hands 1, then 2nd 2.... etc. until you have numbered all the hands.

i.e. 1st hand would be Ac,2c,3c,4c,5c, 2nd hand would be Ac,2c,3c,4c,6c,.... last hand (2598960th hand) would be 9s,10s,Js,Qs,Ks

So we now have a table of hands numbered from 1 to 2598960.

If both sides have this table we can just send the number in the table and both sides will know which hand is being referred to.

Note:

log_2(2598960) approximately equals 21.3095

Compare this to sending log_2(52)=5.7004 bits per card for 5 cards

for a total of 28.50219 bits

We saved 28.5022-21.3095= 7.1927 bits

Here's a bit more details as to where the savings came from:

-We save 0.29 bits by not having duplicate cards

52^5/(52*51*49*48*47)=1.21909

log_2(1.21909)=0.28580

-We save 6.91 bits by not caring about the order

5! permutations= 120

log_2(120)=6.90689

Hope this makes sense(59 votes)

- About the poker hand, what about sending message of card's number and suit separately? For example first question about color, red or black. Then, about the suit. Then about the number? Would it be the same as randomly dividing the shuffled deck?(16 votes)
- Umm lets count

Color: R/B - one bit

Suit: this is another one bit because two suits were ruled out by the earlier step

Number: Is it higher than 7? one bit

Is it higher than 4/11 one more bit

Higher than 2/6/10/12

And one more bit to make the final guess

That adds up to 6 bits, the same as Brit said I think.(24 votes)

- So it's the board game "Guess Who?"(6 votes)
- Pretty much. In the words of Sherlock Holmes (the books, not the horribly romanticized violent movies) "When every possibility is ruled out, whatever remains, however improbable, must be the truth."(20 votes)

- There's been a lot of talk about predictability and redundancy in this Q&A, and as I was thinking about it, a thought hit me: Are predictability and redundancy really all that different; that is, are they really two related phenomena, and that something is predictable because of redundancy.

Of course, now that I write this, I begin to have my doubts. After all, its one thing to say the same thing twice, its another to give the necessary clues for math, logic, and science to deduce with.

A counter argument, I feel, can be found in the example of the Periodic Table of the Elements, which is a high-redundancy communication (according to*The Way Things Work; The Book of the Computer*), which nevertheless seem to me as being filled with a lot more "clues" and less repetitions.

I guess what I am trying to say is, "Does redundancy encompass clues, or repetition only?"

What do you think.(8 votes)- After reading "A Mathematical Theory of Communication" a bit in Research, I know that repetition leads to probability. To find the probability of letters, we must first find repetitions. Redundancy, however, means something extra, something not needed. Although repetition and redundancy have the same literal meaning, they have different connotations. Since redundancy or too much of something is not usually found in English text (over-describing something or repeating something until it becomes unnecessary to do so any more is looked down upon in English texts), I would say that only repetition leads to probability, not redundancy.

Does this answer your question? I know it's completely different from my previous one.(12 votes)

- I have a question to do with the second example, the one with the letters. If you are sending the letters in a series of ones and zeros, with absolutely no other symbols or breaks (that would signify something), don't you have to have more than 28.2.

Suppose you give the first sixteen letters of the alphabet values with 1s and .0s.

Such as a = 0000

b= 0001

c=0010

d=0011

and so on. When you hit p you will run out of possibilities and have to use five symbol ones. Any five symbol signal you devise will be one of the four symbol ones with a 1 or 0 tacked on the end (or the beginning, depending how you think about it).

If Alice sends six of these signals in a row, all the letters will run into each-other. If she sends a five-bit letter followed by a four-bit letter it will be impossible to tell from a four-bit letter followed by a five-bit one. If she pauses between her signals, that would be like another symbol (she could then choose from 1 0 or off).

Supposing Bob sent messages in between hers, asking questions, we would be back down to 4.7 per word, but Bob would be wasting a lot of time encoding each of his questions letter by letter. It would still be more efficient for Alice to encode each letter into a five-bit symbol and Bob recognize that each five-bits represented one word.

Also, won't Alice have to send additional information at the beginning of each message clarifying if she's sending words or numbers or cards or whatever else?(6 votes)- Thanks, I'm answering this so I could find useful questions from my profile page(1 vote)

- At9:05It says that information isn't always random, so couldn't one arrange the most common or predictable information so that it takes fewer bits to transmit? Like at4:46where the number of questions can be at least four but at most five, the most common letters could be arranged to take only four and the least common to take five bits so the average power of two could be reduced, leading to a more streamlined information transmission system.(5 votes)
- This is actually what Morse Code does, so yes, this might be the "next step"(4 votes)

- Can someone explain what a logarithm is to me? I'm only in Algebra, and I haven't learned that yet. What kind of a function is it?(0 votes)
- Did anyone else notice that at the end, the alphabet has x and y switched?(5 votes)
- Hahah, no, I did not notice! You must have been paying great attention!(1 vote)

- What are the 5 cards in a poker hand?(3 votes)
- Any 5 cards from a standard deck comprise a poker hand.(4 votes)

- so could you actually put in a one for a dit, and a 0 for a da...and so on? like instead of - - - --- 1110?(3 votes)
- Yes. That is probably how a computer would read it.(3 votes)

## Video transcript

Voiceover: Consider the
following, Alice and Bob have figured out how to transmit messages between their treehouses. At first, they used flames at night and shutters during the day. Then they used a wire, which
they plucked in different ways. Eventually, they electrified
this wire to send electrical pulses and were now at work on an experimental wireless method. The problem is, in order
to pay for their equipment, they needed money. So, they decided to offer their
service for a fee to others. And on the first day, Alice
had three new customers who wanted to transmit messages to their friends over at Bob's treehouse. The first customer wanted to
send a list of 10 coin flips, the second customer wanted
to send a six-letter word, and the third customer
wanted to send a poker hand. The question now is, how
much should she charge? Well, the price of a
message should depend on how long it takes Alice to transmit it. But how could she measure the length of different types of
messages using a common unit? To find out, let's play a game. Imagine you were Bob now, and you know Alice wants
to send you these messages, but all you can do is get the answer to yes or no questions you've arranged. Alice will answer by sending a sequence of zeros or ones, using some method of variation. Recall that all their
methods of transmission involved the exchange of differences. So, a one could be
represented by an open flame or an open shutter or an electrical pulse. No matter how they are manifested, we can simply call them binary digits, because a binary digit can have only one of two values, zero or one. So, let's say zero represents
a no and one represents a yes. Your challenge now is to always ask the minimum number of questions in order to determine the exact message. First, let's consider the coin flips. For each symbol, the sender, Alice, can be thought of as selecting one of two different
symbols, heads or tails. Now, how many questions do you need to ask to determine which she selected? One question such as, is
it heads, will suffice. For 10 flips, what is the
minimum number of questions? Well, 10 flips times one question per flip equals 10 questions or 10 binary digits to transmit this message. Next, let's consider the letters. For each symbol, the sender, Alice, can be thought of as selecting
one of 26 different symbols. Let's start with the simplest
message, which is one letter. How many questions are needed? Is it A? Is it B? Is it C? Is it D? And so on, but that is not the
minimum number of questions. The best you could do is ask questions which eliminate half of the possibilities. For example, the middle of the
alphabet is between M and N. So, we could first ask, is it less than N? If we receive a one, yes, we cut out half of the possibilities,
which leaves 13 left, and since we can't split a letter in half, we divide the possible symbols into sets of six and seven and
ask, is it less than G? We receive a one, which is yes, and now we are left with
six possible letters, and we can split them in half
and ask, is it less than D? We receive a zero, which is no, leaving us with three possible letters, and now we can pick a
side and ask, is it D? We receive a zero, which is no, and finally we are left
with two possibilities. We ask, is it E? We receive a no, and after five questions, we have correctly
identified the symbol, F. Realize that we will never need to ask more than five questions,
so the number of questions will be at least four and at
most five, and in general, two to the power of number of questions equals the number of possible messages, which we previously defined
as the message space. So, how can we calculate the exact average or expected number of questions, given a message space of 26? We ask the reverse question. Two to the power of something equals 26, and to answer these types of questions, we naturally use a logarithmic
function, base two, because log base two of 26 is the exponent two needs to be raised to, to give us 26, which is approximately 4.7. So, on average,
approximately 4.7 questions will be needed per letter at minimum, and since she wants to transmit
a word with six letters, Bob can expect to ask, at
minimum, 28.2 questions, which means Alice will need to send, at most, 29 binary digits. Finally, let's apply this formula to a new message, the poker hand. Well, for each symbol, the sender, Alice, can be thought of as selecting one of 52 different symbols, and in this case, the number of questions is the same as the number of times we
need to split the deck and ask Alice which pile it is in, until we are left with one
card, which we will find is usually six splits or
questions and sometimes five. But we can save time and
just use our equation. Log base two of 52 is approximately 5.7, since two to the power of
5.7 is approximately 52. So, the minimum number of questions on average is 5.7 per card. A poker hand contains five cards. So, to transmit a poker hand requires 28.5 questions on average. We are done. We now have our unit. It's based on the minimum
number of questions to define the message or the
height of the decision tree, and since Alice transmits this
information as binary digits, we can shorten this and
call our unit the bit, instead of binary digit. So, we have 10 coin
flips requires 10 bits, the six-letter word requires 28.2 bits, and the poker hand requires 28.5 bits. Alice then decides to
charge one penny per bit and begins collecting her fees. Now, this idea emerged in the 1920's. It was one of the more abstract problems that communication engineers
were thinking about. Ralph Hartley was a prolific
electronics researcher who built on the ideas of Harry Nyquist, both of whom worked at Bell
Labs after World War I, and in 1928, Hartley published
an important paper titled, The Transmission of Information, and in it he defines the word information using the symbol H, as H equals
N times the logarithm of S, where H is our information,
N is the number of symbols, whether they're notes,
letters, numbers, etc., and S is the number of
different symbols available at each selection, and
this can also be written as H equals the logarithm
of S to the power of N, and Hartley writes,
"What we have done then is to take as our practical
measure of information the logarithm of the number
of possible symbol sequences." So, information is the
logarithm of the message space; however, realize that
throughout this lesson we have been assuming that the
symbol selection is random, convenient simplification;
however, we know that in reality most communication, such as
speech, isn't always random. It's a subtle mix of
predictability and surprise. We do not roll dice when we write letters, and it is this predictability
which can result in significant savings in
the length of transmission, because when we can
predict things in advance, we shouldn't need to ask
as many yes or no questions to define it, but how
could we formally model this subtle difference? This question brings us to
the key insight in our story. Can you think of what it might be?