RevisionDojo

What is a Distributed Database

A distributed database is a database that is stored across multiple physical locations (servers or cloud).
It appears to users as one single database, even though it's spread across sites.

Types of Distributed Databases

Homogeneous
1. All sites use the same DBMS, schema, and OS.
2. Easier to manage.
Heterogeneous
1. Sites may use different DBMSs, schemas, and OSs.
2. Requires translation layers and is harder to maintain.

Methods of Distribution

Fragmentation
1. Database is split into parts, and each site manages a fragment.
2. Useful when local data is accessed more frequently.
Replication
1. Copies of the data (or whole database) are stored at multiple sites.
2. Improves reliability and availability, but increases complexity.

The Need for Data Consistency

Data Consistency ensures that all users see the same data, regardless of which part of the database they access.
In a distributed system, data consistency is challenging due to:
1. Network Latency: Delays in data synchronization across locations.
2. Concurrent Access: Multiple users updating the same data simultaneously.

Example

Without data consistency, a user booking a hotel room in one location might find it unavailable in another, leading to errors and customer dissatisfaction.

The Role of ACID in Distributed Databases

ACID stands for Atomicity, Consistency, Isolation, and Durability.
These properties ensure reliable transaction processing, even in distributed environments.

Atomicity

Atomicity guarantees that a transaction is all-or-nothing.
In a distributed database, this means:
1. If a transaction updates data in multiple locations, either all updates succeed, or none do.

Example

In a hotel booking system, if a payment is processed but the room reservation fails, atomicity ensures the payment is rolled back.

Consistency

Consistency ensures that a transaction brings the database from one valid state to another.
In distributed systems, this requires:
1. Synchronizing updates across all locations.

Example

After a room is booked, all database copies should reflect the updated availability.

Isolation

Isolation ensures that concurrent transactions do not interfere with each other.
Techniques like locking and timestamping are used to maintain isolation.

Example

Two users booking the last available room simultaneously should not both succeed.

Durability

Durability guarantees that once a transaction is committed, it remains so, even in the event of a system failure.
This is achieved through replication and backup strategies.

Example

If a power outage occurs after a booking is confirmed, the reservation should still be recorded when the system restarts.

Key Features of Distributed Databases

Concurrency Control

Concurrency Control ensures that multiple transactions can occur simultaneously without causing data inconsistencies.
Two main approaches:
1. Pessimistic Concurrency Control (PCC): Locks resources to prevent conflicts.
2. Optimistic Concurrency Control (OCC): Assumes conflicts are rare and checks for them at commit time.

Tip

Use PCC in high-conflict environments and OCC when conflicts are unlikely.

Data Consistency Models

Strong Consistency: Updates are immediately visible across all locations.
Eventual Consistency: Updates propagate over time, allowing temporary inconsistencies.
Causal Consistency: Preserves the order of causally related updates.

Note

Strong consistency is ideal for financial transactions, while eventual consistency suits social media updates.

Data Partitioning

Data Partitioning divides the database into smaller, manageable sections.
Common partitioning methods:
1. Range Partitioning: Based on value ranges (e.g., dates).
2. Hash Partitioning: Uses a hash function to distribute data.
3. List Partitioning: Based on specific values (e.g., regions).

Unlock the rest of this chapter with a Free account

Nice try, unfortunately this paywall isn't as easy to bypass as you think. Want to help devleop the site? Join the team at https://revisiondojo.com/join-us. exercitation voluptate cillum ullamco excepteur sint officia do tempor Lorem irure minim Lorem elit id voluptate reprehenderit voluptate laboris in nostrud qui non Lorem nostrud laborum culpa sit occaecat reprehenderit

Definition

Paywall

(on a website) an arrangement whereby access is restricted to users who have paid to subscribe to the site.

anim nostrud sit dolore minim proident quis fugiat velit et eiusmod nulla quis nulla mollit dolor sunt culpa aliqua

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Duis aute irure dolor in reprehenderit

Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Note

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam quis nostrud exercitation.

Excepteur sint occaecat cupidatat non proident

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit.

Tip

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum.

What is a distributed database?

Note

What is a Distributed Database

A distributed database is a database that is stored across multiple physical locations (servers or cloud).
It appears to users as one single database, even though it's spread across sites.

DefinitionDistributed DatabaseA collection of data that is stored across multiple physical locations but appears as a single database to users.

AnalogyThink of a distributed database like a library with books stored in different branches, but a catalog that makes it seem like all books are in one place.

ExampleGoogle's global search database is a distributed database, with data stored in data centers around the world.

What is a Distributed Database

A distributed database is a database that is stored across multiple physical locations (servers or cloud).
It appears to users as one single database, even though it's spread across sites.

Types of Distributed Databases

Homogeneous
1. All sites use the same DBMS, schema, and OS.
2. Easier to manage.
Heterogeneous
1. Sites may use different DBMSs, schemas, and OSs.
2. Requires translation layers and is harder to maintain.

Methods of Distribution

Fragmentation
1. Database is split into parts, and each site manages a fragment.
2. Useful when local data is accessed more frequently.
Replication
1. Copies of the data (or whole database) are stored at multiple sites.
2. Improves reliability and availability, but increases complexity.

The Need for Data Consistency

Data Consistency ensures that all users see the same data, regardless of which part of the database they access.
In a distributed system, data consistency is challenging due to:
1. Network Latency: Delays in data synchronization across locations.
2. Concurrent Access: Multiple users updating the same data simultaneously.

Example

Without data consistency, a user booking a hotel room in one location might find it unavailable in another, leading to errors and customer dissatisfaction.

The Role of ACID in Distributed Databases

ACID stands for Atomicity, Consistency, Isolation, and Durability.
These properties ensure reliable transaction processing, even in distributed environments.

Atomicity

Atomicity guarantees that a transaction is all-or-nothing.
In a distributed database, this means:
1. If a transaction updates data in multiple locations, either all updates succeed, or none do.

Example

In a hotel booking system, if a payment is processed but the room reservation fails, atomicity ensures the payment is rolled back.

Consistency

Consistency ensures that a transaction brings the database from one valid state to another.
In distributed systems, this requires:
1. Synchronizing updates across all locations.

Example

After a room is booked, all database copies should reflect the updated availability.

Isolation

Isolation ensures that concurrent transactions do not interfere with each other.
Techniques like locking and timestamping are used to maintain isolation.

Example

Two users booking the last available room simultaneously should not both succeed.

Durability

Durability guarantees that once a transaction is committed, it remains so, even in the event of a system failure.
This is achieved through replication and backup strategies.

Example

If a power outage occurs after a booking is confirmed, the reservation should still be recorded when the system restarts.

Key Features of Distributed Databases

Concurrency Control

Concurrency Control ensures that multiple transactions can occur simultaneously without causing data inconsistencies.
Two main approaches:
1. Pessimistic Concurrency Control (PCC): Locks resources to prevent conflicts.
2. Optimistic Concurrency Control (OCC): Assumes conflicts are rare and checks for them at commit time.

Tip

Use PCC in high-conflict environments and OCC when conflicts are unlikely.

Data Consistency Models

Strong Consistency: Updates are immediately visible across all locations.
Eventual Consistency: Updates propagate over time, allowing temporary inconsistencies.
Causal Consistency: Preserves the order of causally related updates.

Note

Strong consistency is ideal for financial transactions, while eventual consistency suits social media updates.

Data Partitioning

Data Partitioning divides the database into smaller, manageable sections.
Common partitioning methods:
1. Range Partitioning: Based on value ranges (e.g., dates).
2. Hash Partitioning: Uses a hash function to distribute data.
3. List Partitioning: Based on specific values (e.g., regions).

Unlock the rest of this chapter with a Free account

Definition

Paywall

(on a website) an arrangement whereby access is restricted to users who have paid to subscribe to the site.

anim nostrud sit dolore minim proident quis fugiat velit et eiusmod nulla quis nulla mollit dolor sunt culpa aliqua

Duis aute irure dolor in reprehenderit

Note

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam quis nostrud exercitation.

Excepteur sint occaecat cupidatat non proident

Tip

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum.

End of article

Flashcards

Remember key concepts with flashcards

15 flashcards

What is a distributed database?

Lesson

Recap your knowledge with an interactive lesson

21 minute activity

Note

What is a Distributed Database

A distributed database is a database that is stored across multiple physical locations (servers or cloud).
It appears to users as one single database, even though it's spread across sites.

DefinitionDistributed DatabaseA collection of data that is stored across multiple physical locations but appears as a single database to users.

AnalogyThink of a distributed database like a library with books stored in different branches, but a catalog that makes it seem like all books are in one place.

ExampleGoogle's global search database is a distributed database, with data stored in data centers around the world.

A1 Computer fundamentals4 subtopics

A2 Networks4 subtopics

A3 Databases4 subtopics

A4 Machine learning4 subtopics

B1 Computational thinking1 subtopic

B2 Programming5 subtopics

B3 Object-oriented programming2 subtopics

B4 Abstract data types (HL only)1 subtopic

A3.4.4 Features of Distributed Databases Notes

What is a Distributed Database

Types of Distributed Databases

Methods of Distribution

The Need for Data Consistency

The Role of ACID in Distributed Databases

Atomicity

Consistency

Isolation

Durability

Key Features of Distributed Databases

Concurrency Control

Data Consistency Models

Data Partitioning

Unlock the rest of this chapter with a Free account

anim nostrud sit dolore minim proident quis fugiat velit et eiusmod nulla quis nulla mollit dolor sunt culpa aliqua

Duis aute irure dolor in reprehenderit

Excepteur sint occaecat cupidatat non proident

What is a Distributed Database

A1 Computer fundamentals4 subtopics

A2 Networks4 subtopics

A3 Databases4 subtopics

A4 Machine learning4 subtopics

B1 Computational thinking1 subtopic

B2 Programming5 subtopics

B3 Object-oriented programming2 subtopics

B4 Abstract data types (HL only)1 subtopic

What is a Distributed Database

Types of Distributed Databases

Methods of Distribution

The Need for Data Consistency

The Role of ACID in Distributed Databases

Atomicity

Consistency

Isolation

Durability

Key Features of Distributed Databases

Concurrency Control

Data Consistency Models

Data Partitioning

Unlock the rest of this chapter with a Free account

anim nostrud sit dolore minim proident quis fugiat velit et eiusmod nulla quis nulla mollit dolor sunt culpa aliqua

Duis aute irure dolor in reprehenderit

Excepteur sint occaecat cupidatat non proident

What is a Distributed Database