Main content

## Modern information theory

# Compression codes

## Video transcript

Voiceover: When we represent
information, such as an image, such as an image, digitally, it means we must slice
it up into tiny chunks. This allows us to send an image as a sequence of color symbols, and these colors can be represented as unique numbers, using some code. Consider the following challenge. Alice and Bob can transmit
and receive messages binary. (Morse code) They charge their
customers 1 penny per bit to use their system, and
a regular customer arrives who wants to send a message, and their messages are 1,000 symbols long. The meaning of the messages is completely unknown. Normally, this is sent
with a standard 2-bit code, which results in charging for 2,000 bits. However, Alice and Bob
already did some analysis on this customer before,
and determined that the probability of each symbol
in the message is different. Can they use these known probabilities to compress the transmission
and increase their profits? What's the optimal coding strategy? David Huffman famously provided the optimal strategy,
which he published in 1952, and based on building a binary
tree from the bottom up. To begin, we can list
all symbols at the bottom which we can call nodes. Then we find the two least probable nodes, in this case B and C, and merge them into one, and add the probabilities together. Then we repeat with the
next two least likely nodes, and continue merging until you have a single node at the top. Finally, we label the edges in this tree with 0 or 1 in any order. Now the code for each
letter is just the path from the top of the tree
to the given letter. So for A, it's just one edge, or 1. This is now known as a Huffman coding, and for examples of the following type you cannot beat it. Go ahead and try. For example, if you shorten the code for D to just 0, then a message
011 could perhaps mean DAA, or maybe just B. So for this to work, you
would need to introduce letter spaces, which
cancel out any savings during transmission. Now, how far does this
compress the message compared to the original 2,000 bits? Well, we just need to calculate the number of bits per letter on average. So we multiply the length of each code times the probability of occurrence, and add them together, which results in an average length of 1.75 bits per symbol on average. That means with this Huffman coding we can expect to compress the messages from 2,000 bits to 1,750 bits. And Claude Shannon was the first to claim that the limit of
compression will always be the entropy of the message source. As the entropy, or uncertainty, of our source decreases due to
known statistical structure, the ability to compress increases. (Morse code) While, if the entropy increases
due to unpredictability, our ability to compress decreases. (Morse code) if we want to compress beyond entropy, we must necessarily throw away information in our messages.