If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

AP®︎/College Computer Science Principles

Unit 1: Lesson 6

Lossless data compression

Lossless image compression

AP.CSP:
DAT‑1 (EU)
,
DAT‑1.D (LO)
,
DAT‑1.D.1 (EK)
,
DAT‑1.D.2 (EK)
,
DAT‑1.D.3 (EK)
Images are all around us, from application icons to animated GIFs to photos. Image files can take up a lot of space, so computers employ a range of algorithms to compress image files.
For the simplest of images, computers can use a compression algorithm called run-length encoding (RLE).

Bitmaps

Before we explore image compression, let's see how we can represent an image in binary without any compression.
Here's a simple image, a 16x16 heart icon:
A red heart icon with a white background
Let's zoom in and overlay a grid on top, so that it's easy to see exactly which pixels are red and which pixels are white:
A 16x16 grid of pixels with the heart icon. The heart is made of red pixels and the background are white pixels.
The heart icon is made up of only two colors, red and white, so a computer could represent it in binary by mapping red pixels to 1 and white pixels to 0. This is called a bitmap, since it's mapping pixels to bits.
Using this method, the heart icon would be represented like so:
0001100000110000
0011110001111000
0111111011111100
1111111111111110
1111111111111110
1111111111111110
1111111111111110
1111111111111110
0111111111111100
0011111111111000
0001111111110000
0000111111100000
0000011111000000
0000001110000000
0000000100000000
Imagine that you had to read the bits above out to someone who was copying them down. After a while, you might say things like "five zeroes" instead of "zero zero zero zero zero". Well, the computer can do that too...

RLE compression algorithm

In run-length encoding, the computer replaces each row with numbers that say how many consecutive pixels are the same color, always starting with the number of white pixels.
For example, the first row contains 3 white pixels, 2 red pixels, 5 white pixels, 2 red pixels, then 4 white pixels:
0001100000110000
This would be represented as follows:
3,2,5,2,4
The fourth row is interesting because it starts with a red pixel. Run-length encodings start with the number of white pixels, so this is how it'd be represented:
0,15,1
Here's the second row of the bitmap for the heart icon:
0011110001111000
How would that row be represented in RLE?

RLE decompression

When a computer uses run length encoding, it should be able to perfectly recreate the image from the compressed representation—and so should we, if we follow the computer's strategy.
Let's try it. Here's a representation of a black & white icon using RLE:
4, 9, 3
4, 7, 2, 1, 2
4, 7, 2, 1, 2
4, 9, 3
4, 7, 5
4, 7, 5
5, 5, 6
0, 15, 1
1, 13, 2
The first row has 4 white pixels, then 9 black pixels, then 3 white pixels. That looks like:
A row of 16 pixels, starting with 4 white, then 9 black, then 3 white.
The next row has 4 white pixels, then 7 black, 2 white, 1 black, and 2 white. That looks like:
A row of 16 pixels, starting with 4 white, then 7 black, then 2 white, then 1 black, and finally 2 white.
When we keep going, the final icon is a cup and saucer:
A grid that is 16 pixels wide by 19 pixels high, where the pixels are filled in black in the shape of a cup and saucer.
The following is a compression of a 6x6 black and white icon, using RLE:
2,2,2
2,2,2
0,6
0,6
2,2,2
2,2,2
What mathematical symbol does that icon resemble?

Compression ratio

We've claimed that run-length encoding can save us space when storing simple images—but how much space?
To find out, I wrote a program to encode black & white bitmaps with RLE. This table summarizes the results on three heart icons of increasing size:
ImageDimensionsUncompressedAfter RLESpace savings
Small black & white heart icon
16x1625622810.9%
Medium black & white heart icon
32x32102453248.0%
Large black & white heart icon
128x12816384289882.3%
Take a look at that final column, the space savings. Notice a pattern? We save much more space as the size increases since the runs are much longer.
What about images of the same size? This table summarizes the results of compressing three large icons with RLE:
ImageDimensionsUncompressedAfter RLESpace savings
Large black & white heart icon
128x12816384289882.3%
Large black & white icon of caduceus (snakes on a pole with wings)
128x12816384829849.4%
Large black & white icon of a crown
128x12816384873046.7%
Now you can see why I picked a heart icon as my example: it compresses very well, thanks to its many runs of black or white. RLE compression still halves the size of the other icons, but it doesn't save nearly as much space.
In fact, sometimes RLE can't save any space at all...

Limits of RLE

An icon made up of random colors for every pixel.
Let's zoom in, so we can visualize each pixel:
A grid of 16x16 pixels. Each pixel is a random color.
Every pixel in that icon is a different color, and there are no runs.
Run-length-encoding can't compress an image like that at all. That's an example that I made up just for this article (generated by this program), so it might not seem that common.
As it turns out, photographic images are similar to that icon—the real world is filled with details that interrupt runs.
The Khan Academy team page includes this lovely photo of a dog looking at a computer screen:
A photo of a dog staring intently at a computer screen.
At normal resolution, it looks like there are blocks of similar color, like in the dog's fur or the grey in the computer screen.
Let's zoom in to the pixels:
A pixelated grid zoomed into the snout of a dog looking at a computer screen. The pixels are blue-ish and orange-ish, but they are a large number of shades of blue and orange.
Now you can see that even the seemingly simple computer screen is a vast array of similar-but-not-quite-the-same colors. A run-length-encoding of the pixels won't do much to reduce the file size.

Uses for RLE

RLE compression was a very popular technique when most computer images were icons with limited color palettes.
These days, our images are more complex and don't contain as many runs of the same color.
Where is RLE used today? Fax machines still use RLE for compressing the faxed documents, since they only need to represent black and white letters. JPEG images do use RLE during a final stage of compression, but they first use a more complex algorithm to compress the photographic details. We'll explore that more soon.

Want to join the conversation?

• Sooo...unless two or more consecutive pixels have the same exact colour...they cannot be compressed?
• Specifically with Run length Encoding compression algorithms, that is true. The article says that other compression methods exist.
• If the average row is less than 8 pixels, how can it be compressed using numbers? Using the example in your article 0011110001111000 is 2,4,3,4,3. If any number/letter takes 8 bits, why wouldn't the compressed picture be 40 bits vs 16 bit for the uncompressed picture?
• The both of you are correct. The smaller the file is the less efficient the compression becomes. IF you look at the table with the compression ratio, you see that the compression ratio for a 16x16 bit image is just a little bit over 10%. At 8x8 bit you'd probably need more storage for the compressed image.
The same problem applies if the picture is complex, the more complex the image is the less efficient the RLE compression becomes. Or as you've figured out it might actually increase the size of the file.
• When RLE is used to replace 00001111 with (4,4), the "4"s still need to be converted to binary for storage, right? So in that case wouldn't you need to use 2 bytes of storage to represent the two "4"s (00000100,00000100)? I guess I'm not seeing where the storage space savings come from when the computer has to store the replacement symbol as binary code.
• That's a really good question.

Those are conceptual examples meant to show the general idea of run length encoding, in a way that makes sense to a human. Your computer needs more than 1 bit to represent a pixel and that's when the savings come in.
• Why are there 16 rows in the heart grid but only 15 bitmaps?
• Huh, nice catch!
Looks like one line got accidently dropped in the translation (between line 9 and 10).
• When you have more than 2 colors, and therefore need more than 1 binary digit to represent colors, do computers use hexadecimal numbers like (FF,FF,FF) as mentioned earlier in the course?
• The RGB encoding is most commonly used I think. But you could also use a different encoding.
• seeing RLE numbers, how to know that pixel is white or black? no any hint for it
• The first number in a line stands for white, the next for black, then white again and so on
• If two or more of the color is, so should be how to compress?
• Say Red = R, Blue = Blue, Green = G and you have the picture

RRRRGGGBBBBR

where every color stands for a pixel of that color, then you could encode it to

4R,3G,4B,1R
• How is it possible that objects of the same dimensions have different values after compression by RLE and different space savings. I don't quite understand it .
• RLE compresses consecutive pixels of the same color, meaning that if same-colored pixels are repeated after one another, they can be compressed into a single number.

In the red heart bitmap, rows 1-4 can be compressed to the following:
Row 1 --> 3, 2, 5, 2, 4
Row 2 --> 2, 4, 3, 4, 3
Row 3 --> 1, 6, 1, 6, 2
Row 4 --> 15, 1
Where the first column represents the number of repeated white pixels, the second column represents repeated red pixels, the third column white, and so on.

Some images can be compressed more than other images of the same size depending on how much of the same pixels are repeated in a row.

For example the 128x128 heart image can be compressed from 16,384 pixels to 2,898 number values, while a 128x128 checker board wouldn't be compressed at all because none of the same pixels are repeated consecutively.

I hope this helped! :-)