The Concept of Compression
Compression
Compression is the process of reducing the size of data files to save storage space and improve transmission efficiency.
It plays a crucial role in modern computing, enabling faster data transfer, reduced storage costs, and optimized performance.
Compression is essential for managing large data sets, optimizing web content delivery, and enhancing application performance by minimizing load times and storage requirements.
Lossless vs. Lossy Compression
Lossless compression
Lossless compression runs an algorithm to compress the data, the algorithm can then be reversed to restore the data. This ensures that the original data can be perfectly reconstructed from the compressed version.
It achieves this by identifying and eliminating redundant patterns without losing any information.
Lossy compression
Lossy compression reduces file size by discarding some data that is less noticeable or redundant, resulting in an approximation of the original content.
It achieves higher compression rates than lossless methods by removing details that have minimal impact on perception, making it ideal for multimedia formats like images, audio, and video.
| Aspect | Lossless Compression | Lossy Compression |
|---|---|---|
| Data Integrity | Preserves original data | Discards some data |
| Compression Ratio | Lower | Higher |
| Applications | Backups, archival storage, text files | Images, audio, video |
| Reversibility | Reversible (exact reconstruction) | Irreversible (data loss is permanent) |
| Perceptual Redundancy | Does not exploit perceptual redundancy | Exploits perceptual redundancy to discard less noticeable data |
The choice between lossless and lossy compression depends on the specific application and the trade-off between file size and data integrity.
- Lossless compression requires an algorithm to decompress the file to access the data.
- May impact small embedded systems that do not have the resources available to decompress the file
- Lossy compression the data has been removed so there is no processing required to reconstruct the file, it can simply be viewed as is.
- Suitable for small embedded systems that have limited processing power.
- E.g. a kids toy with sound has limited processing power and space, lossy compression is best.
Lossless Compression Methods
Huffman
- Analyses the frequency of symbols (letters, numbers, etc.) in a file.
- Assigns shorter binary codes to more common symbols, and longer codes to rare ones.
- This produces a prefix-free binary tree, no code is a prefix of another.
The result: fewer bits used for common characters → smaller file.
Like giving short nicknames to people you talk to often, and full names to rare contacts.
Run-Length Encoding (RLE)
Run-Length Encoding (RLE)
Run-Length Encoding (RLE) is a lossless data compression technique that replaces consecutive repeating occurrences of a symbol with a single instance of the symbol followed by a count of its repetitions.
How RLE Works
- Identify Consecutive Values: RLE scans the data for consecutive occurrences of the same value.
- Replace with Code: Each run is replaced with a code representing the value and its count.
The sequence AAAAABCCC becomes 5A1B3C.
- RLE always makes content smaller.
- If there are not many repeated characters then it might actually make the compressed version larger than the original.
- Example 1: AAAAABCCCC (10 char) to 5A1B4C (6 char)
- Example 2: HELLO_WORLD (11 char) to 1H1E2L1O1_1W1O1R1L1D (20 char)
- Example 1 reduced size due to high number of repeated characters, however Example 2 made the compressed version larger as there was not a lot of repeated characters.
Applications of RLE
- Text Files: Compressing spaces or repeated characters.
- Images: Efficient for images with large areas of uniform color, such as faxes or simple graphics.
RLE is most effective for data with repetitive patterns but less efficient for data with high variability.
Other examples of lossless compression include Dictionary-Based Compression (LZ77 - Sliding Window / LZW - Dictionary-Based)
| Method | How it Works | Best For |
|---|---|---|
| Run-Length Encoding (RLE) | Replaces runs with count + value | Simple images, repetitive data |
| Huffman Coding | Uses shorter codes for common items | Text, structured files |
| LZ77 / LZW | Uses dictionary or pointers to past data | Logs, text, backups, ZIP files |
Lossy Compression
- Some data is removed permanently to reduce file size.
- Used for images, audio, and video, where perfect accuracy isn’t always needed.
- Examples:
- JPEG (images)
- MP3 (audio)
- MP4 (video)
- Lossy compression can’t be reversed.
- Once it’s lost, it’s gone.
Transform Coding
- Used in JPEG, MP3, etc.
- Converts raw data (like pixels or sound waves) into a frequency-based format.
- Removes high-frequency details (stuff the human eye/ear won’t notice).
- Then compresses the transformed data more efficiently.
Transform coding
Transform coding is a lossy compression technique that converts data from the spatial or time domain into a different mathematical representation, typically using frequency-based transformations like the Discrete Cosine Transform (DCT) or Discrete Wavelet Transform (DWT).
How Transform Coding Works
- Apply a Transformation: Data is transformed into a different representation, often highlighting its statistical structure.
- Compress the Transformed Data: Redundancies are identified and reduced in the transformed space.
Transform coding is powerful for data with complex patterns, achieving higher compression ratios than simpler methods like RLE.
Why Compression Matters
- Storage Efficiency: Compressed files require less disk space, reducing storage costs.
- Faster Transmission: Smaller files transfer more quickly over networks, improving performance.
- Resource Optimization: Compression minimizes bandwidth usage and load times for web content and applications.
- When choosing a compression method, consider the trade-off between file size and data quality.
- Lossless compression is ideal for critical data, while lossy compression is suitable for media files where some quality loss is acceptable.
- Efficiency vs. Quality:
- How do you balance the need for smaller file sizes with the preservation of data quality?
- Ethical Considerations:
- What are the implications of using lossy compression in contexts where data integrity is critical, such as medical imaging or legal documents?