📕 Concurrency Control
Concurrency control is a mechanism used in database management systems (DBMS) to manage the access of multiple transactions to the same data concurrently. It ensures that concurrent transactions do not produce incorrect results, maintain consistency and correctness of the data in the database, and preserve the isolation property of transactions.
Concurrency control is essential in modern database systems because of the increased demand for simultaneous access to shared data by multiple users or applications. It is achieved through different techniques, such as locking, timestamping, and optimistic concurrency control, and helps in improving the performance, reliability, and scalability of the database system.
âš¡ Concurrency Usage
Here are some points summarizing why we should use concurrency control in database systems:
In multi-user environments, multiple transactions may access and modify the same data concurrently, leading to potential conflicts and inconsistencies in the data.
Concurrency control techniques ensure that transactions do not interfere with each other, and data remains in a consistent state.
Without concurrency control, concurrent access to shared data can lead to lost updates, dirty reads, inconsistent analysis, inconsistent retrievals, deadlocks, and starvation.
Lock-based concurrency control techniques prevent conflicts by allowing transactions to acquire locks on the data they need to access, ensuring that only one transaction can modify the data at a time.
Timestamp-based concurrency control techniques use timestamps to order transactions and ensure that transactions access data in a consistent order, preventing conflicts and ensuring data consistency.
Validation-based concurrency control techniques check the effects of transactions before allowing them to commit, preventing conflicts and ensuring data consistency.
📕 Protocols of Concurrency Control
There are different types of concurrency control protocols that are used in database management systems to ensure the correctness and consistency of data in the presence of multiple concurrent transactions. Here are some of the most common types of concurrency control protocols:
âš¡ Lock-Based Concurrency Control
This protocol is based on the concept of locks, which are used to prevent two transactions from accessing the same data item simultaneously. Locks can be acquired for either read or write access, and a transaction can only proceed if it has the required lock on the data item. Lock-based concurrency control ensures that transactions do not interfere with each other and that their execution appears to be serialized.
There are different types of lock-based concurrency control protocols, including:
Two-Phase Locking (2PL): In this protocol, transactions acquire locks in two phases: an expanding phase and a shrinking phase. During the expanding phase, transactions can acquire locks on new resources, but cannot release any locks. During the shrinking phase, transactions can release locks, but cannot acquire any new locks. This protocol guarantees serializability and ensures that no transaction can read or modify data that has been locked by another transaction.
Strict Two-Phase Locking (Strict 2PL): In this protocol, transactions acquire locks on all the resources they need at the beginning of the transaction and hold them until the end of the transaction. This ensures that no other transaction can access the locked resources until they are released, which guarantees serializability. However, this protocol can result in deadlocks if two or more transactions hold locks on resources that the other transactions need.
Rigorous Two-Phase Locking (Rigorous 2PL): This protocol is similar to Strict 2PL, but with an additional rule that all locks must be released at the end of the transaction, not just the ones acquired during the transaction. This ensures that no transaction can hold any locks after it completes, which eliminates the possibility of deadlocks.
Binary Lock: A binary lock is a simple lock that either allows or denies access to a data item. A binary lock can be in one of two states: locked or unlocked. When a transaction wants to access a data item, it requests a lock on the item. If the item is already locked by another transaction, the requesting transaction has to wait until the lock is released. When a transaction releases a lock on a data item, another transaction can request the lock.
Shared and Exclusive Locks: In this protocol, transactions can acquire either shared or exclusive locks on resources. Shared locks allow multiple transactions to read the same resource simultaneously, while exclusive locks allow only one transaction to read or modify a resource at a time. This protocol allows for more concurrency than Strict 2PL, but can still result in deadlocks.
Multiple Granularity Locking: This protocol allows transactions to acquire locks at different levels of granularity, such as page-level or record-level locks. This allows transactions to lock only the resources they need, which can improve concurrency. However, managing locks at different levels of granularity can be complex and can lead to deadlocks.
âš¡ Timestamp-Based Concurrency Control
This protocol assigns a unique timestamp to each transaction when it starts. The timestamp is used to determine the order of execution of transactions, and a transaction can only read data that has been committed by transactions with an earlier timestamp. Timestamp-based concurrency control ensures that transactions are executed in a serializable order and that the database remains consistent and correct.There are two types of timestamp-based concurrency control protocols:
Timestamp ordering (TO): This protocol assigns a unique timestamp to each transaction when it begins its execution. When a transaction tries to access a data item, the system checks the timestamp of the transaction against the timestamp of the last transaction that accessed the data item. If the timestamp of the accessing transaction is greater than the timestamp of the last transaction, the access is allowed, and the transaction updates the timestamp of the data item to its own timestamp. Otherwise, the access is denied, and the transaction is rolled back.
Multiversion timestamp ordering (MVTO): This protocol is a variant of TO that allows multiple versions of a data item to coexist in the database, each with its own timestamp. When a transaction tries to read a data item, the system selects the version of the data item with the largest timestamp that is less than or equal to the timestamp of the transaction. If the transaction tries to write a data item, the system creates a new version of the data item with a new timestamp, and updates the transaction's timestamp to the new timestamp. If the transaction tries to write a data item that has already been updated by a more recent transaction, the system rolls back the transaction.
âš¡ Validation Based Protocol
Validation-Based Protocol is a concurrency control technique used in database systems to ensure that only valid transactions are committed. The validation-based protocol consists of two main phases:
Read Phase: In the read phase, a transaction reads the database to collect all the data it needs to perform its operation. Each transaction is assigned a unique timestamp when it starts, which is used to determine the order in which transactions are processed. During the read phase, the transaction takes a snapshot of the database state, which represents a consistent view of the database at a particular point in time.
Validation Phase: In the validation phase, the transaction is validated to ensure that it can commit without violating the consistency of the database. The validation process checks whether any other transaction has modified the data that the transaction read during the read phase. If any other transaction has modified the data, the transaction is considered invalid, and it has to be rolled back. If the transaction passes the validation process, it can commit its changes to the database.
The validation-based protocol can be further divided into two types of validation:
Conservative Validation: In conservative validation, a transaction is only allowed to commit if it can be validated that it has not read or modified any data that has been modified by any other transaction with a later timestamp.
Optimistic Validation: In optimistic validation, a transaction is allowed to commit if it can be validated that it has not modified any data that has been modified by any other transaction with a conflicting timestamp. If the transaction fails the validation process, it is rolled back, and the transaction is restarted.
Different types of protocols, such as lock-based, timestamp-based, optimistic, multi-version, and two-phase locking, offer different trade-offs between performance, scalability, and reliability, and must be chosen based on the requirements of the specific application.
✔ Concurrency Important Mistakes that may occur
Race conditions
A race condition occurs when two or more threads access shared data and try to modify it simultaneously, resulting in unpredictable behavior. This can lead to data corruption, program crashes, or other unexpected results. Here's an example of a race condition:
int count = 0;
class MyThread implements Runnable {
public void run() {
for (int i = 0; i < 1000; i++) {
count++;
}
}
}
// Create two threads that access the shared count variable
Thread thread1 = new Thread(new MyThread());
Thread thread2 = new Thread(new MyThread());
// Start both threads
thread1.start();
thread2.start();
// Wait for both threads to finish
thread1.join();
thread2.join();
// Print the final count value
System.out.println("Count: " + count);
In this example, two threads increment a shared count variable by 1000 each. However, because there is no synchronization or locking mechanism in place, both threads may access the variable simultaneously, resulting in a race condition. As a result, the final count value may be less than the expected value of 2000.
To avoid race conditions, developers need to use locking mechanisms or atomic operations to ensure that only one thread can access the data at a time.
Deadlocks
A deadlock occurs when two or more threads are waiting for each other to release a lock, resulting in a situation where none of the threads can proceed. This can cause the application to become unresponsive or crash. Here's an example of a deadlock:
class BankAccount {
private int balance;
private int id;
public BankAccount(int id, int balance) {
this.id = id;
this.balance = balance;
}
public void deposit(int amount) {
synchronized (this) {
balance += amount;
}
}
public void withdraw(int amount) {
synchronized (this) {
balance -= amount;
}
}
public int getBalance() {
return balance;
}
public int getId() {
return id;
}
public void transfer(BankAccount to, int amount) {
synchronized (this) {
synchronized (to) {
withdraw(amount);
to.deposit(amount);
}
}
}
}
// Create two bank accounts
BankAccount account1 = new BankAccount(1, 1000);
BankAccount account2 = new BankAccount(2, 2000);
// Create two threads to transfer money between accounts
Thread thread1 = new Thread(() -> {
account1.transfer(account2, 500);
});
Thread thread2 = new Thread(() -> {
account2.transfer(account1, 1000);
});
// Start both threads
thread1.start();
thread2.start();
// Wait for both threads to finish
thread1.join();
thread2.join();
// Print the final balances of both accounts
System.out.println("Account 1 balance: " + account1.getBalance());
System.out.println("Account 2 balance: " + account2.getBalance());
In this example, two threads attempt to transfer money between two bank accounts. However, because the threads acquire locks on the accounts in different orders, a deadlock can occur. For example, if thread 1 acquires a lock on account 1 and thread 2 acquires a lock on account 2, both threads will be waiting for each other to release their locks, resulting in a deadlock.
To prevent deadlocks, developers need to ensure that locks are acquired and released in the same order across all threads.
Overusing synchronization
Synchronization is a useful mechanism for ensuring thread safety, but overusing it can lead to performance issues and even deadlocks. Overuse of synchronization occurs when threads are unnecessarily blocked or synchronized, leading to decreased concurrency and increased contention.
For example, consider a scenario where a shared resource is guarded by a lock. If multiple threads continuously acquire and release the lock, it can lead to performance issues due to the overhead involved in acquiring and releasing the lock. Additionally, if one thread holds the lock for an extended period, it can lead to other threads being blocked, leading to decreased concurrency.
To avoid overusing synchronization, it is important to use it judiciously and only where required. Locks should be held for the shortest possible time to reduce contention and increase concurrency. It is also essential to ensure that the shared resource is designed to minimize the need for locking wherever possible.
Furthermore, other synchronization mechanisms such as semaphores and barriers should be used only where necessary and with proper care. Overusing synchronization can lead to decreased performance, increased contention, and even deadlocks, which can severely impact the system's overall efficiency and effectiveness.
Not using Atomic operations
Atomic operations are those operations that can execute without any interference from other concurrent operations. Failing to use atomic operations can result in inconsistencies in data. For example, consider a bank account balance update operation that involves subtracting a certain amount from the current balance. If two concurrent transactions perform this operation simultaneously, it can lead to inconsistent results. For instance, both transactions might read the same balance, subtract the same amount, and update the balance, resulting in an incorrect final balance.
To avoid such issues, atomic operations should be used, which guarantees that the operation will be executed without any interference. In the above example, a single atomic operation can be used to update the balance, which will ensure that the balance update is done atomically, preventing inconsistencies.
Not understanding the workload
One of the common concurrency mistakes is not understanding the workload. It is essential to understand the concurrency requirements of the workload and design the system accordingly. For example, if the workload is read-intensive, the system should be designed to handle multiple concurrent reads without conflicts. On the other hand, if the workload is write-intensive, the system should be designed to handle multiple concurrent writes without conflicts.
Not understanding the workload can result in a poorly designed system that is not capable of handling the required concurrency. This can lead to performance issues, scalability problems, and even data inconsistencies. Therefore, it is crucial to understand the workload and design the system accordingly.
✔ Disadvantages of Concurrency
Concurrency in database systems can lead to several potential problems, including:
Lost updates: This occurs when two or more transactions try to update the same data concurrently, and one transaction overwrites the changes made by another transaction, resulting in the loss of the changes made by the overwritten transaction.
Dirty reads: This occurs when a transaction reads data that has been modified by another transaction, but that transaction has not yet committed its changes. If the modifying transaction rolls back its changes, the read transaction will have read data that was never committed to the database.
Inconsistent analysis: This occurs when a transaction reads data multiple times, and the data is modified by another transaction between reads. This can lead to inconsistent analysis results, where the transaction's analysis of the data is based on different versions of the data.
Inconsistent retrievals: This occurs when a transaction reads data that is in the process of being modified by another transaction, leading to a retrieval of inconsistent data.
Deadlocks: This occurs when two or more transactions are waiting for each other to release resources they need to complete their operation. Deadlocks can lead to a complete halt in the system, as the transactions cannot proceed.
Starvation: This occurs when a transaction is constantly blocked from accessing resources it needs, and therefore cannot proceed. This can happen when other transactions hold locks on resources that the blocked transaction requires.
Overall, concurrency control is essential for maintaining data integrity and consistency in multi-user database systems, ensuring that transactions do not interfere with each other and that data remains accurate and up-to-date.
Thanks for reading, and happy coding!
DBMS Concurrency Control: Techniques and Best Practices -> Exploring the Database Keys: Understanding Their Significance