If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Compression codes

What is the limit of compression? Created by Brit Cruise.

Want to join the conversation?

  • aqualine ultimate style avatar for user Aaron Williams
    Is audio data pseudo-random from a binary point of view? Is audio compression done by throwing away non-audible frequencies from a fast fourier transform?
    (17 votes)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user Cameron
      A lot of audio data is reasonably compressible, which suggests that it would be a poor candidate for randomness (we shouldn't be able to compress "random" data). However, it would be possible to find audio signals that are good candidates for randomness.

      For lossy compression, for both images and audio signals, you are basically correct in that they use the idea of transforming the signals into the frequency domain and discard frequencies that won't be noticed. However, the transform that is used is typically the Discrete Cosine Transform (DCT), which is similar to the Discrete Fourier Transform. DCT is used for .mp3 and .jpg files.

      For more details on DCT see:
      http://en.wikipedia.org/wiki/Discrete_cosine_transform

      Hope this makes sense
      (22 votes)
  • leaf red style avatar for user Noble Mushtak
    "So for this to work, you need to introduce letter spaces, which cancel out savings during transmissions."
    How much are letter spaces accounted for in compression codes? They don't seem to be accounted for when compressing the message from 2000 bits to 1750 bits at bit are accounted for when calculating savings. Letter spaces don't seem to be bits, but they give information by showing where one letter ends and another begins, so how would you measure them information-wise?

    "...And Claude Shannon was the first to claim that the limit of compression would be the entropy of the message source."
    This claim is intuitive, but if we compress D further to just 0 (which doesn't seem to create any problems...although that's not a Huffman coding anymore...), wouldn't that compress the message to 1.5 bits per letter which is smaller than the entropy or does that throw away information?
    I think that compressing D to just 0 would cause some kind of problem because otherwise contradicts Shannon's claim and is thus counter-intuitive since the smallest average number of bits one letter takes up should be the number of bits of information one letter has and not any smaller. Also, if I was right, the claim most likely wouldn't even be mentioned without stating or at least implying it wasn't right.
    (12 votes)
    Default Khan Academy avatar avatar for user
    • male robot johnny style avatar for user cenicoleit
      Noble,
      Your confusion is understandable. You've been caught by a parenthesis at the lecture.

      Professor Brit says that the Huffman Coding is optimal. At he says "for examples of the following type. You cannot beat it. Go ahead and try. For an example..." Here start the parenthesis.

      After this short introduction, Professor emulated an unsuccessful attempt to beat the Huffman Coding. He reduces the code for "D" (to be only 0). And so, the conclusion of the unsuccessful attempt: it will not work, because generates ambiguity. He continued talking about the "reduced code", and concluded further at : "So for this to work (to transcend the ambiguity), you'd need to introduce letter spaces, which cancel out any savings during transmission."

      With the last frase, he concluded the parenthesis. The tentative to beat the Huffman Coding was unsuccessful. The proposed code (that reduces the code for "D" to be only 0) was not only worst than Huffman's, it was ineffective. The letter spaces would cancel out any savings.

      The sequence of the lecture is completely disconnected with the unsuccessful attempt to beat Huffman Coding (the proposed code does not even compress the information). The parenthesis and the letter spaces went to oblivion and Professor Brit follows explaning about the compression with Huffman Coding.

      I hope it helps :)
      (6 votes)
  • hopper cool style avatar for user Deven
    Does radio send information using Binary Digits? Or do the waves cause the receiver to vibrate at a frequency conducive for analog decompression into audio waves?
    (8 votes)
    Default Khan Academy avatar avatar for user
    • leaf blue style avatar for user prenticedarren
      Yes! It does now, but traditional radio was born out of the latter. Since FM can involved several frequencies at once, it allowed for developments such as stereo audio and subcarriers to transmit digital information through the waves (ie. text info for digital receivers). Now days through a service called "HD radio" from iBiquity, digital audio data is sent alongside the standard analog signal for those who have digital receivers that can decode the information. The digital signal has a much cleaner sound as the receiver is effectively judging 100% yes or no decisions, throwing out all the low level junk noise/distortion from the analog airwaves.
      (8 votes)
  • female robot grace style avatar for user Rey #FilmmakerForLife #EstelioVeleth.
    what does "entropy" mean?
    (6 votes)
    Default Khan Academy avatar avatar for user
    • male robot johnny style avatar for user jon.email
      Entropy is a measure of disorder or randomness. In information theory, it specifically refers the average amount of information in every message, which increases as the message gets more random.

      For example, the machine in he videos that is equally likely to give you four symbols has more entropy than a machine that gives different symbols at different rates. I'd suggest rewatching the Information Entropy video.

      Entropy is often used in other areas of science, especially thermodynamics. It's the measurement of disorder or chaos in a system, and relates to the ability of a system to do useful work. Over time, entropy always increases in a closed system as the system becomes more disorganized.

      For example, if you drop food colouring into a glass of water, the food colouring starts off fairly ordered (all in the spot you dropped it). However, disorder and entropy will increase over time: the food colouring will slowly spread out until it is equally randomly dispersed throughout the water,
      (7 votes)
  • piceratops ultimate style avatar for user David
    The probability distribution of the four letters in the video worked out well in that 2*12.5 = 25 and 2*25 = 50. But, what if the probabilities weren't such clean multiples of each other? What if you had probabilities of 0.41, 0.31, 0.17, and 0.11 respectively for A, B, C, and D? How do you group the nodes when their probabilities don't imply obvious divisions?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user Cameron
      Here's how you would generate a Huffman coding for the following nodes:
      A=0.41, B=0.31, C=0.17, D=0.11

      Find the two least probable nodes: C and D
      Group them together and create a new node, whose probability is their sum:
      We create: Node (C,D) with probability 0.17+ 0.11 = 0.28

      Now we have:
      A=0.41, B=0.31, (C,D)=0.28


      Find the two least probable nodes: B and (C,D)
      Group them together and create a new node, whose probability is their sum:
      We create: Node (B,(C,D)) with probability 0.31+ 0.28 = 0.59

      Now we have:
      A=0.41, (B,(C,D))=0.59

      Find the two least probable nodes: A and (B,(C,D))
      Group them together and create a new node, whose probability is their sum:
      We create: Node (A,(B,(C,D))) with probability 0.41+ 0.59 = 1.00

      We have just created this Huffman Coding tree

        /\ 
      A *
      / \
      * B
      /\
      D C


      or with probabilities shown:
        (A,(B,(C,D)))=1.00
      / \
      A=0.41 (B,(C,D))=0.59
      / \
      (C,D)=0.28 B=0.31
      / \
      D=0.11 C=0.17


      If we use 0 for each left branch and 1 for each right branch the code is
      A is 0 (length of 1 bit)
      B is 11 (length of 2 bits)
      C is 101 (length of 3 bits)
      D is 100 (length of 3 bits)

      You can try entering this distribution into this program:
      https://www.khanacademy.org/computer-programming/huffman-coding/6047325206085632


      The average bit size per symbol is:
      0.41 * 1 + 0.31 * 2 + 0.17 * 3 + 0.11 * 3 = 1.87

      The entropy of the probability distribution is:
      -0.41* log_2(0.41) -0.31* log_2(0.31) -0.17* log_2(0.17) -0.11 * log_2(0.11) = 1.836

      The difference between the entropy of the probability distribution and our average size
      per symbol is caused by the probabilities not being powers of 2.

      However, this is the best we can do using a prefix free code (assuming that we stick to binary
      branching and only encode one symbol at a time), since the Huffman code is the provably
      optimal prefix free code.


      If instead we tried the distribution in the video with
      A= 1/2 B= 1/8 C=1/8 D= 1/4

      The average bit size per symbol is:
      1/2 * 1 + 1/4 * 2 + 1/8 * 3 + 1/8 * 3 = 1.75

      The entropy of the probability distribution is:
      -1/2* log_2(1/2) -1/4* log_2(1/4) -1/8 * log_2(1/8) -1/8 * log_2(1/8) = 1.75

      Here, the average bit size per symbol actually matches the entropy of the probability
      distribution (because we used nice powers of two for our probabilities).


      So if our probabilities are powers of 2, our Huffman coding will be perfect, if not, it won't be, but we won't be able to find a better prefix free code (assuming that we stick to binary branching and only encode one symbol at a time).

      Hope this makes sense
      (5 votes)
  • blobby green style avatar for user Alan Mainwaring
    This is really tricky stuff, Shannon was an incredibly deep thinker. I know this is not a question but these videos are the best I have seen, I am going to have to go over these quite a few times. I wonder if this material may be the foundation of explaining the weirdness of Quantum Mechanics in a more "realistic" way ?
    (4 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Collin Mendia
    Why can we number the edges in any order we want. Wouldn't this cause confusion on the other end?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user crccpete
    What is the prefix used to say "This is a new character"?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • aqualine ultimate style avatar for user thatisthat
      If you're using Huffman coding, you don't need a prefix to say "This is a new character," because Huffman coding produces a prefix-free code. Therefore, no code for any symbol will be at the beginning of the code for another symbol.

      In the video, the code is A=1, B=011, C=010, and D=00. Try decoding the following message from left to right. You'll find there is no ambiguity!
      0101011
      First digit is 0, doesn't correspond to a code. Second digit is 1, so now we're working with 01, doesn't correspond to a code. Third digit is 0, so now we're working with 010 = C! Next digit is 1, which corresponds to A! and so on, to get CAB. You see there is no need to signal that "This is a new character" with a prefix-free code.
      (2 votes)
  • blobby green style avatar for user aarav.trivedi12
    At I do not get why we have to add spaces.
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user moldbreaker
    So this Huffman binary tree seems really good at storing information efficiently, without the use of separators (like spaces). Nevertheless aren't separators really useful for quickly accessing a certain symbol?

    Let me rephrase this question:
    Consider the following code (using the encoding from the video):
    100101001110110111

    This code translates to addacbabba if read from left to right. But what if you wanted to start somewhere in the middle, e.g. with the 5th symbol? Is there any way to systematically know where the 5th symbol starts (without decoding from left to right)?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user Cameron
      Since the symbols have different lengths you need to look at almost all of the bits to determine where the 5th symbol starts.

      This should illustrate why we are forced to look at almost all of the bits to find out where the 5th symbols starts. Imagine, we skipped over just 3 bits. That 3 bits might have been:
      1 symbol e.g. 011
      1 symbol plus part of another symbol e.g. 000(followed by 0 or 10 or 11)
      2 symbols e.g. 001
      3 symbols e.g. 111

      So even skipping over just 3 bits makes it impossible to tell where the 5th digit starts.
      (2 votes)

Video transcript

Voiceover: When we represent information, such as an image, such as an image, digitally, it means we must slice it up into tiny chunks. This allows us to send an image as a sequence of color symbols, and these colors can be represented as unique numbers, using some code. Consider the following challenge. Alice and Bob can transmit and receive messages binary. (Morse code) They charge their customers 1 penny per bit to use their system, and a regular customer arrives who wants to send a message, and their messages are 1,000 symbols long. The meaning of the messages is completely unknown. Normally, this is sent with a standard 2-bit code, which results in charging for 2,000 bits. However, Alice and Bob already did some analysis on this customer before, and determined that the probability of each symbol in the message is different. Can they use these known probabilities to compress the transmission and increase their profits? What's the optimal coding strategy? David Huffman famously provided the optimal strategy, which he published in 1952, and based on building a binary tree from the bottom up. To begin, we can list all symbols at the bottom which we can call nodes. Then we find the two least probable nodes, in this case B and C, and merge them into one, and add the probabilities together. Then we repeat with the next two least likely nodes, and continue merging until you have a single node at the top. Finally, we label the edges in this tree with 0 or 1 in any order. Now the code for each letter is just the path from the top of the tree to the given letter. So for A, it's just one edge, or 1. This is now known as a Huffman coding, and for examples of the following type you cannot beat it. Go ahead and try. For example, if you shorten the code for D to just 0, then a message 011 could perhaps mean DAA, or maybe just B. So for this to work, you would need to introduce letter spaces, which cancel out any savings during transmission. Now, how far does this compress the message compared to the original 2,000 bits? Well, we just need to calculate the number of bits per letter on average. So we multiply the length of each code times the probability of occurrence, and add them together, which results in an average length of 1.75 bits per symbol on average. That means with this Huffman coding we can expect to compress the messages from 2,000 bits to 1,750 bits. And Claude Shannon was the first to claim that the limit of compression will always be the entropy of the message source. As the entropy, or uncertainty, of our source decreases due to known statistical structure, the ability to compress increases. (Morse code) While, if the entropy increases due to unpredictability, our ability to compress decreases. (Morse code) if we want to compress beyond entropy, we must necessarily throw away information in our messages.