Understanding Data Representation
You've probably heard about bits and bytes, but what exactly are they?
Bit
The smallest unit of data in a computer, representing a binary value of 0 or 1.
The term bit is short for binary digit.
AnalogyA single bit can represent two states, such as on/off, true/false, or yes/no.
Byte
Fundamental unit of data storage, which consists of eight bits
Bits are usually denoted by lowercase b, but bytes by a capital B (1B = 8b).
Common Mistake- It's crucial not to confuse bits and bytes.
- For example, 12Mb (megabits) is not the same as 12MB (megabytes).
- Since 1 byte = 8 bits, 12 MB is eight times larger than 12Mb.
To effectively utilise such units, we use the binary numbering system (base 2).
Binary
Binary is a base-2 number system, using only two digits: 0 and 1. Each digit in a binary number is called a bit (short for binary digit).
All data and instructions in a computer are stored and processed in binary form.
Binary Explained
- When working with hexadecimal numbers (base 16), remember that each digit represents four bits (half of the byte).
- This makes it easy to convert between binary and hexadecimal.
The relationship between the number of bits and the number of possible representations is given by $2^n$, where $n$ is the number of bits.
Storage of Different Kinds of Data
Here are some interesting examples of how different entities are represented in binary and stored in memory.
Integers
We can distinguish two types of integers: unsigned and signed.
Unsigned integers:
- Use all 8 bits to represent non-negative values.
- Range: represents values from 0 to 255
$35$ can be stored as $00100011$.
Signed integers:
- Uses one bit for the storage of the sign and 7 remaining bits for the value.
- The most significant bit (MSB) (the leftmost bit) is used as a sign bit (0 for positive, 1 for negative).
- Range: represents values from -128 to 127.
Storing negative integers poses additional challenges, such as the sum of additive inverse numbers (for instance, $5$ and $-5$ should be $0$. Therefore, there are multiple methods of calculating negative integers.
Signed Magnitude, One's and Two's Complement
Two's complement is the most popular.
Character Symbols
Text can be stored as a string type - a sequence of characters.
Each character in a string is encoded using a character encoding scheme (e.g., UTF-8, ASCII and Unicode).
ExampleThe string "Hello" in UTF-8 requires 5 bytes (one byte per character).
NoteUnicode vs ASCII encoding
- ASCII is limited to 128 characters, which is insufficient for non-Latin alphabets.
- Unicode provides a universal encoding system, supporting thousands of characters from various languages.
When working with text, strings and characters, always consider the encoding scheme, as it affects the storage size and compatibility across systems.
Colours
There are different approaches when it comes to representing colours. The most popular include those you have probably seen before:
RGB Model:
- Colours are represented as combinations of Red, Green, and Blue.
- Each component is typically stored as an 8-bit value, ranging from 0 to 255.
For simplicity, the RGB model has also introduced a hexadecimal representation, where colours are often expressed as six-digit hexadecimal codes.
Example#FF5733 represents:
- FF (Red) = 255
- 57 (Green) = 87
- 33 (Blue) = 51
Choosing the right data type and encoding scheme is crucial for optimising storage and performance.
Binary as a Universal Language
Binary is the foundational language of computers, enabling communication and data processing across all systems.
However, while binary is universal for machines, it is not a practical language for human communication. Hence, it raises a question:
Theory of KnowledgeDoes binary qualify as a universal "lingua franca"? Consider its role in enabling communication between machines versus its usability for humans.
Self review- What is the difference between a bit and a byte?
- How does the binary number system differ from the decimal system?
- Why is hexadecimal often used in computing?
- How does the choice of data representation impact the efficiency of a program?
- Why is Unicode essential for global communication in computing?
How does hexadecimal representation simplify working with colours in digital systems?