Run-Length Encoding (RLE) is one of the few compression techniques in IB Computer Science that students are expected to understand procedurally, not just conceptually. It is a form of lossless compression, and IB exam questions often require students to apply it step by step rather than simply define it.
Many students lose marks by misunderstanding how RLE works or by applying it in situations where it is not suitable. This article explains run-length encoding clearly, from basic principles to exam expectations.
What Is Run-Length Encoding?
Run-length encoding is a lossless compression method that reduces file size by:
- Identifying repeated values
- Replacing them with a single value and a count
It works best when data contains long sequences of repeated values, known as runs.
Because no data is removed, the original file can be reconstructed perfectly.
Why Run-Length Encoding Is Lossless
RLE does not remove any information. Instead, it:
- Stores how many times a value appears consecutively
- Allows the original sequence to be rebuilt exactly
This makes RLE suitable for situations where accuracy is essential.
Step-by-Step Example of Run-Length Encoding
Consider the following data sequence:
AAAAABBCCCCDD
Step 1: Identify runs of repeated characters
- A appears 5 times
- B appears 2 times
- C appears 4 times
- D appears 2 times
Step 2: Replace each run with a value and a count
The encoded result might be written as:
