If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Video transcript

Voiceover: When we represent information, such as an image, such as an image, digitally, it means we must slice it up into tiny chunks. This allows us to send an image as a sequence of color symbols, and these colors can be represented as unique numbers, using some code. Consider the following challenge. Alice and Bob can transmit and receive messages binary. (Morse code) They charge their customers 1 penny per bit to use their system, and a regular customer arrives who wants to send a message, and their messages are 1,000 symbols long. The meaning of the messages is completely unknown. Normally, this is sent with a standard 2-bit code, which results in charging for 2,000 bits. However, Alice and Bob already did some analysis on this customer before, and determined that the probability of each symbol in the message is different. Can they use these known probabilities to compress the transmission and increase their profits? What's the optimal coding strategy? David Huffman famously provided the optimal strategy, which he published in 1952, and based on building a binary tree from the bottom up. To begin, we can list all symbols at the bottom which we can call nodes. Then we find the two least probable nodes, in this case B and C, and merge them into one, and add the probabilities together. Then we repeat with the next two least likely nodes, and continue merging until you have a single node at the top. Finally, we label the edges in this tree with 0 or 1 in any order. Now the code for each letter is just the path from the top of the tree to the given letter. So for A, it's just one edge, or 1. This is now known as a Huffman coding, and for examples of the following type you cannot beat it. Go ahead and try. For example, if you shorten the code for D to just 0, then a message 011 could perhaps mean DAA, or maybe just B. So for this to work, you would need to introduce letter spaces, which cancel out any savings during transmission. Now, how far does this compress the message compared to the original 2,000 bits? Well, we just need to calculate the number of bits per letter on average. So we multiply the length of each code times the probability of occurrence, and add them together, which results in an average length of 1.75 bits per symbol on average. That means with this Huffman coding we can expect to compress the messages from 2,000 bits to 1,750 bits. And Claude Shannon was the first to claim that the limit of compression will always be the entropy of the message source. As the entropy, or uncertainty, of our source decreases due to known statistical structure, the ability to compress increases. (Morse code) While, if the entropy increases due to unpredictability, our ability to compress decreases. (Morse code) if we want to compress beyond entropy, we must necessarily throw away information in our messages.