If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

A mathematical theory of communication

Claude Shannon demonstrated how to generate "english looking" text using Markov chains. Created by Brit Cruise.

Want to join the conversation?

  • leaf grey style avatar for user hunteryoung
    At , how does Shannon decide where to separate the random letters into words? On a related note, I'm still not quite understanding how you could generate a two letter word using a third order approximation.
    (20 votes)
    Default Khan Academy avatar avatar for user
    • mr pink red style avatar for user brit cruise
      Good question!
      Spaces can be treated like any other symbol (I simply didn't use any in my examples)

      With a third order approximation you could a space as one of the symbols. This could result in two letter words, as follows:

      We know that, "is" is almost always followed by " "
      Which would result in many "IS" "" trigrams in that state
      (31 votes)
  • starky sapling style avatar for user christinabarta1
    At , he creates pairs of possible letter combinations but repeats many tiles that begin with A, especially AA tiles. I don't understand why he includes so many tiles in the A cup and only three tiles in the B and C cups.
    Is this because A is needed to occur more often for the pattern to look like English writing?
    (7 votes)
    Default Khan Academy avatar avatar for user
    • leaf red style avatar for user Noble Mushtak
      This is not about English, but rather a language Claude Shannon made up.

      Second-Order Approximation:
      Brit Cruise/Claude Shannon took this message:
      acaaaabcbbaacc
      broke it up into pairs of two:
      ac|aa|aa|bc|bb|aa|cc
      broke it up into pairs of two in a different way:
      (Ignore a)|ca|aa|ab|cb|ba|ac|(Ignore c)
      then took all the pairs and put them into cups depending on what letter they started with (I have them rearranged differently):
      A Cup
      ac
      aa
      aa
      aa
      aa
      ab
      ac
      B Cup
      bc
      bb
      ba
      C Cup
      cc
      ca
      cb

      As you can see, it's a coincidence that one copy of every possible pair starting with B and C is listed one time and if the message had multiple of them, Brit Cruise and Claude Shannon would've listed multiple under the B and C cups.

      I hope this clarifies that part of the video!
      (19 votes)
  • mr pants teal style avatar for user pranav.kirsur
    what are the uses of this?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • leaf red style avatar for user Noble Mushtak
      One use of approximating English language could be to pre-test a person's English's comprehending. These random examples could ensure that the reader has never read text.

      Also, artificial intelligence could be made with approximating English based off of text the computer has seen or heard previously. This might lead to computers having conversations with each other or humans.

      I hope this gives you some ideas of practical uses of approximating English!
      (7 votes)
  • leafers ultimate style avatar for user paul.ebreo
    My second set of questions (see Part 1 below):

    Part 2 - Questions about 3-gram algorithm and transitioning from one cup to another

    First, is the 3rd word in a 3-gram algorithm dependent on the 1st word or a combination of the 1st and 2nd word? Given the following occurrences:

    the man ate
    the man ate
    a man ate

    So here the the 3rd word 'ate' is more likely to come from 'the man' then 'a man', correct? So you would put it in different cups like this:

    Cup 'the man'
    the man ate
    the man ate

    Cup 'a man'
    a man ate

    Last question: For both 2-gram and 3-gram algorithms, when you pick a phrase from a cup, what happens when the last word does not match another cup name? So from above, if I pick 'a man ate', the last word 'ate' does not match another cup name. In that case, I'd just have to pick a random cup, right?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • mr pink red style avatar for user brit cruise
      Hey Paul!
      Remember that to construct our cups (or states) we need some input text:

      using your example: "the man ate the man ate a man ate"

      Then we need to slide along and capture every possible pair of words.

      in this case would need *5* cups for:
      The man
      man ate
      ate the
      ate a
      a man

      So, you will *never* have an occurrence where you pick a word that doesn't belong to another state. See above we have a "man ate" cup.

      Let me know if this is clear!
      (3 votes)
  • hopper jumping style avatar for user The Onion Router
    What book does he keep marking in, i might want to read it.
    (4 votes)
    Default Khan Academy avatar avatar for user
  • leafers ultimate style avatar for user paul.ebreo
    Brit, these Markov videos are so interesting it really made me want to learn a lot more about n-grams and how to apply them to English words! But I have so many question!! I have so many that I'm splitting them into two parts.

    Part 1 - Terminology and 2-gram

    First, the terminology. Does 2-gram mean the second word of word n, e.g. the 2-gram of 'the man' is 'man'? Or is better to just call it the 2nd word?

    Next, the application of the n-gram algorithms. How would you apply a 2-gram algorithm to English words. So given the following word pairs:

    a man
    the man
    hey man
    a dog
    the dog

    Here, the n=2 refers to the words 'man' and 'dog' , correct? Consequently, we would have 3 cups: 'a', 'the' and 'hey', is that right? Giving the following cups:

    Cup 'a'
    a dog
    a man

    Cup 'the'
    the dog
    the man

    Cup 'hey'
    hey man

    I'm pretty sure that's how you would do it.
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leafers ultimate style avatar for user paul.ebreo
      Here's are answers to my own question:

      Terminology:
      n-gram - the slice of words
      Order - the number of slice of words
      Level - the units of the n-gram
      Example: a letter-level n-gram with at 3rd order looks like:

      dba
      bbd

      a word-level n-gram with 4th order looks like:

      "jim is very happy"
      "oven dog candy house"

      Determining how many cups. Answer: the number of cups depends on how many unique slices of words. So if you have "the man ate food", with a word-level 2nd order n-gram you will have the following 3 cups:

      the man
      man ate
      ate food
      (3 votes)
  • blobby green style avatar for user Daniel Mcbride
    If synaptic nerves are electrical signals, wouldn't they produce radio signals, which in turn could be deciphered via technology even before the verbal enunciation of syntax?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • male robot johnny style avatar for user Alex
    At , is "improvementagement" supposed to be a real word?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user seanmcolby
    How does selection in tri-grams (or higher order n-grams) work? With a 2nd order approximation, one writes the first letter in the pair, and uses the second as an index to select from the next "cup."

    In a tri-gram, would the bins (cups) be arranged in a matrix? Where each bin is assigned two indices? In that case, the first letter one writes down, the second letter selects a column, the third letter selects a row, and one draws from the cup with said coordinate (i.e., row-column position)?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user antixmehra
    How do these Markov chains relate to Moore and Mealy machines?
    (2 votes)
    Default Khan Academy avatar avatar for user

Video transcript

Voiceover: Shannon had just finished developing his theories related to cryptography and therefore was well aware that human communication was a mix of randomness and statistical dependencies. Letters in our messages were obviously dependent on previous letters to some extent. In 1949, he published a groundbreaking paper, "A Mathematical Theory of Communication". In it, he uses Markov models as the basis for how we can think about communication. He starts with a toy example. Imagine you encounter a bunch of text written in an alphabet of A, B, and C. Perhaps you know nothing about this language, though you notice As seem to clump together, while Bs and Cs do not. He then shows that you could design a machine to generate similar-looking text, using a Markov chain. He starts off with a zeroth-order approximation, which means we just independently select each symbol A, B, or C at random, and form a sequence However, notice that this sequence doesn't look like the original. He shows then you could do a bit better with a first-order approximation, where the letters are chosen independently, but according to the probability of each letter in the original sequence. This is slightly better as As are now more likely, but it still doesn't capture much structure. The next step is key. A second-order approximation takes into account each pair of letters which can occur. In this case, we need three states. The first state represents all pairs which begin with A, the second all pairs that begin with B, and the third state all pairs that begin with C. Notice now that the A cup has many AA pairs, which makes sense, since the conditional probability of an A after an A is higher in our original message. We can generate a sequence using this second-order model easily as follows. We start anywhere and pick a tile, and we write down our output the first letter, and move to the cup defined by the second letter. Then we pick a new tile, and repeat this process indefinitely Notice that this sequence is starting to look very similar to the original message, because this model is capturing the conditional dependencies between letters. If we wanted to do even better, we could move to a third-order approximation, which takes into account groups of three letters, or "trigrams". In this case, we would need nine states. But next, Shannon applies this exact same logic to actual English text, using statistics that were known for letters, pairs, and trigrams, etc. He shows the same progression from zeroth-order random letters to first-order, second-order and third-order sequences. He then goes on and tries the same thing using words instead of letters, and he writes "the resemblance to ordinary English text "increases quite noticeably at each depth." Indeed, these machines were producing meaningless text, though they contained approximately the same statistical structure you'd see in actual English. Shannon then proceeds to define a quantitative measure of information, as he realizes that the amount of information in some message must be tied up in the design of the machine which could be used to generate similar-looking sequences. Which brings us to his concept of entropy.