How Binary is Used to Store Data
The Fundamentals of Binary Encoding
- As explored in the previous chapter, binary is a base-2 numeral system that uses only two digits: 0 and 1.
- Each digit is called a bit (short for binary digit).
- Everything in a computer is represented as 1s and 0s
- This is because computers work on transistors which are either on (1) or off (0), perfect for binary
Binary is like a digital Morse code, only two symbols, but endless combinations.
Binary is the most compact representation for computers, as it directly aligns with their hardware, which uses transistors operating in two states.
Binary Encoding of Different Data Types
Integers
Unsigned Integers
- Use all bits to represent non-negative values.
- 8 bits would show values 0 - 255
Signed Integers
Two's Complement
Two's complement is a method for representing signed (positive, negative, and zero) binary numbers in computers. It's the most common way to represent integers in digital systems, enabling the use of the same hardware for both addition and subtraction operations.
- Use two's complement to represent negative values
- where the most significant bit (MSB) is the sign bit.
- 1 indicates a negative value of the MSB (e.g. 1011 = -8 + 2 + 1 = -5)
- 0 indicates a positive value as the MSB is negative (i.e. -0 + 1 = 1)
- 8 bits would show values -128 to +127
Sign and Magnitude
A signed binary integer representation that uses the MSB to indicate 0 for + and 1 for - while the remaining place values indicate the number being represented.
- The Most significant bit indicates a symbol
- a 1 indicates Negative
- a 0 indicates Positive
- The remaining place values are then used as normal to calculate the magnitude of the number
- Example 4 bit number: 1011
- MSB is 1 so negative
- The remaining 3 bits 011 is 3
- The number is therefore -3
Binary-coded decimal
- Based on using 4 bits of binary to represent the character of a number.
- This is useful to avoid rounding errors, especially in financial systems.
- However, they require more space and more complex processing to store values and perform arithmetic.
- Example: 1001 0011 is 93
- 1001 = 9
- 0011 = 3
Fixed Point Numbers
- The decimal point is placed in a predetermined position. e.g. 4 bits
- This is a simple and quick method for producing decimal values
- However lacks precision and range
- If from the place value of $2^0$ values to the left are raised to an additional power, the numbers to the right are raised to a negative power. $2^{-1}$ and so on.
| Place value | $2^3$ | $2^2$ | $2^1$ | $2^0$ | . | $2^{-1}$ | $2^{-2}$ | $2^{-3}$ | $2^{-4}$ |
|---|---|---|---|---|---|---|---|---|---|
| Place | 8 | 4 | 2 | 1 | . | 0.5 | 0.25 | 0.125 | 0.0625 |
| Bit | 0 | 1 | 1 | 0 | . | 0 | 1 | 1 | 1 |
| Decimal Value | 0 | 4 | 2 | 0 | . | 0 | 0.25 | 0.125 | 0.0625 |
Calculation: $(1 \times 4)+(1 \times 2)+(1 \times 0.25)+(1 \times 0.125)+(1 \times 0.0625) = 6.4375$
Strings and Characters
ASCII
- Uses 7 bits to represent 128 characters.
- Extended ASCII used 8 bits to allow accented charaters.
- Each Character is assigned a number and that number is converted into binary.
- ASCII only accounts for English like characters due to the limited number of bits available.
Unicode
- Uses variable-width encodings like UTF-8 to represent characters from all writing systems.
- Can store over 100,000 characters.
- Includes emojis, accents, symbols, global scripts (e.g. 汉字, русский).
- Unicode uses 16-bit or more, takes more space, but more flexible.
Images
- Uses the Red Green Blue colour model
- Image is split into pixels (picture elements), individual squares in a picture showing a specific colour.
- Height x Width of pixels in an image is referred to as the resolution
- Using a mix of Red, Green and Blue light each pixel shows a specific colour
- When they are displayed together in a grid of pixels much like a collage it forms an image
- Colour depth refers to the number of bits allocated to each pixel to represent the different colours.
- E.g. 1 bit: only has 2 options (either 1 or 0) so only 2 colours.
- 8 bit per RGB: means that you can have values Ethernet 0-255 for each, making up 16.8 million different colours (true colour and is the most commonly used colour depth).
- Pure RED in RGB binary would be: 11111111 00000000 00000000
- WHITE: highest intensity (all the lights on) of them all: 11111111 11111111 11111111
- BLACK: lowest intensity (all the lights off) of them all: 00000000 00000000 00000000
- File size for an image would be the resolution multiplied by the colour depth.
- This image file is called a Bitmap.
Think of cutting up small squares of colours from magazines and then sticking them together on a canvas to make a collage image.
- This is similar to how Bitmaps work.
- The smaller and more squares you can cut out and the more colours you can find, results in better image quality.
- The same is true for computers: higher resolution and higher colour depth.
Audio
- Normal sound that we hear is analogue, this is continuous data.
- Digital sound is discrete, meaning it has been measured and recorded as an exact value at that point in time.
- To record sound and create digital audio, sampling has to occur, this is followed then by Quantization.
Sampling
- Measures the amplitude (volume) of the sound waves at regular intervals
- Sample rate or sample frequency refers to how often samples are taken.
- Measured in Hertz (times per second) or Kilohertz (kHz) thousand times per second.
- Common sample rates: 44.1 kHz, 48kHz and the High Definition sound at 96 kHz.
- A bit depth is then allocated to each sample and this is the number of possible predetermined values that that sample most closely matches.
Quantization
- Converts these measurements into binary numbers using the bit depth.
- Matches the amplitude to the most similar point in the set.
- 16 bit audio would allow each sample to be represented as a 16 bit binary number.
- Mono - single channel, so would only have 1 set of samples.
- Stereo - two channels, so would have 2 sets of samples (left and right).
Sound file size is calculated in bits as follows:
- No of Channels X Bit Depth X Time (in Sec) X Sample Rate
Video
- Real life is continuous, and analogue, therefore to create a digital version of it, samples have to be recorded (frames).
- Frames per Second (FPS) specifies how many images a camera captures per second.
- More frames = smoother looking video.
Frames
- Each frame is a bitmap image, with pixels encoded in binary.
- Capturing at 60 FPS would be 60 images to show in 1 second, 120 FPS would be 120 images to show in that second.
- The images are played in quick sequence one after the other to give the imitation of motion.
Video frames per second is a little like making a flip card animation. The more cards (frames) you can draw (capture) with minor changes and the faster you can flick them (FPS) the smoother you animation is.
Audio
- Refer to audio above
- Video will have an audio track/s that is recorded and the timestamps matched to ensure that the sound matches the frame.
- This process is called Multiplexing.
Compression
- Formats like H.264 reduce the size of video files by encoding only changes between frames.
- This allows files to maintain quality without creating unreasonably large files.
If no compression was applied to a video, even a short 30 sec clip (like a reel) would be calculated as follows:
- 30 (sec) x 60 (fps) x 48000000 (resolution of iPhone 15 camera 48MP) x 24 (colour depth) = 241.4 Gigabytes
- Due to compression in reality this is more like 340 Megabytes if filmed in 4K resolution at 60 FPS
Binary encoding allows for the efficient storage of data, with specific encoding schemes optimized for types of data to minimize space without sacrificing quality.
Binary Encoding and Storage and Retrieval of Data
Binary Encoding
- Process of representing items such as images, videos, numbers, text and sound as binary.
- Everything on a computer is represented then in binary 1s and 0s.
- 1 bit is the smallest unit
| Unit | Bits |
|---|---|
| 1 bit | 1 |
| 1 nibble | 4 |
| 1 byte | 8 |
| 1 kilobyte | $1024 \times 8$ |
| 1 megabyte | $1024^2 \times 8$ |
| 1 gigabyte | $1024^3 \times 8$ |
| 1 terabyte | $1024^4 \times 8$ |
| 1 petabyte | $1024^5 \times 8$ |
Data Storage
- Binary Encoding minimises space usage, by running algorithms on specific file types, without sacrifice quality of the data.
- This system has continued throughout technological advancements from Magnetic - Optical - Solid State storage systems, they all use binary 1s and 0s.
- Standards ensure that data can then be retrieved across different systems.
Data Retrieval
- Efficiency: Binary encoding enables rapid data retrieval and processing, as computers and CPU architecture is designed to work with binary data.
- Error Detection and Correction: Mechanisms like parity bits help ensure data integrity during storage and transmission.
- When analyzing an algorithm's time complexity, always consider the worst-case scenario first.
- Then evaluate average-case performance, which often provides a more realistic assessment of practical efficiency.