By the end of this unit, you should be able to
- Explain serializability, confict serializability.
- Apply conflict serializability check to schedule.
- Explain Strict 2PL and 2PL
- Apply Strict 2PL and 2PL to generate conflict serialiable schedules.
Recall from the previous unit, we learn that the isoliation property ensures that multiple concurrent transactions must not be dependent on one another. In other words, if the DBMS executing these concurrent transactions by interleaving, the result be the same as executing them in a sequential order.
Suppose we have a set of tasks
We call a sequence of distinct instructions
- for all
$i\in{1,n}$ , for all$p \in {1, m_i}$ , we find$c^i_p \in {c_1,...,c_k}$ . - for all
$q \in {1,k}$ , there exist$i\in{1,n}$ and$p \in {1, m_i}$ such that$c_q = c^i_p$ . - for all
$q \in {1,k}$ and$q' \in {q,k}$ , there not exist$i\in{1,n}$ and$p \in {1, m_i}$ and$p' \in {p, m_i}$ such that$c_q = c^i_{p'}$ and$c_{q'} = c^i_p$ .
The first rule enforces all instructions in the tasks can be found in the schedule. The second rule enforces all instructions in the schedule can be found in the tasks. The third rule enforce that the relative order between instructions from the same task are maintained in the schedule.
Given tasks
- there not exists a sub-sequence
$c_{q-1}, c_{q}, c_{q+1}$ of$c_1,...,c_k$ such that$c_{q-1}$ and$c_{q+1}$ are instructions from task$t_j$ and$c_q$ is an instruction from task$t_i$ , for some$i$ and$j$ in${1,n}$ .
In other words, a schedule is serial iff there is no interleaving.
Note that given a set of tasks
Given two database states
Inductively, we can define
Given tasks
In other words, a schedule is serializable if its effect is the same as a serial schedule after execution.
For example, consider the following transaction
-- transaction 1
begin;
select @bal1 := bal from saving where id=1001; -- c1
update saving set bal = @bal1-100 where id=1001; -- c2
commit;
-- transaction 2
begin;
select @bal2 := bal from saving where id=1001; -- c1'
update saving set bal = @bal2*1.05 where id=1001; -- c2'
commit;
The schedules c1,c2,c1',c2'
and c1',c2',c1,c2
are serializable schedules, but c1,c1',c2,c2'
is not a serializable schedule, we illustrate the state and instruction execution as follows
state before | instruction | state after |
---|---|---|
100 | c1 | 100, bal1 = 100 |
100,bal1 = 100 | c1' | 100, bal1 = 100, bal2 = 100 |
100, bal1 = 100, bal2 = 100 | c2 | 0, bal1 = 100, bal2 = 100 |
0, bal1 = 100, bal2 = 100 | c2' | 105, bal1 = 100, bal2 = 100 |
At the end of execution, we found the balance of account 1001
is 105 though we withdraw 100 from this account in transaction1.
Our dream approach is to be able to check whether the given schedule is serializable without executing it. However such a check is very expensive because
- there are
$n!$ possible serial schedules we need to verify the given one against. - we need to model the set of possible operations and the value domains (without executing them).
Our second best approach is to use a conservative approximation check, say
Note that we don't need to guarantee the inverse direction still holds (since
One possible of consevative approximation is Conflict Serializability Check.
From this point onwards, we sometimes refer
- an instruction
$c$ as an operator, which could be$R(A)$ or$W(A)$ where$R(\cdot)$ stands for the read operator, and$A$ is the object that being read,$W(\cdot)$ denotes a write operator. - a task
$t$ as an transaction$T$ . - a state
$s$ is a mapping from objects, such as$A$ ,$B$ , etc to values.
Two instructions
-
$c_1$ and$c_2$ belong to different transactions and -
$(c_1 = R(A) \wedge c_2 = W(A))$ or$(c_1 = W(A) \wedge c_2 = R(A))$ or$(c_1 = W(A) \wedge c_2 = W(A))$ for some common object$A$ in the state.
Given a schedule
Intuitively,
Given two schedules
- both schedules belong to the same set of transactions.
- for any pair of conflicting operators
$(c_i, c_j)$ in$X$ , such that$c_i \prec c_j$ , we find$(c_k', c_l')$ in$X'$ such that$c_k' \prec c_l'$ and$c_i = c_k'$ and$c_j = c_l'$ .
In other words, the second condition in the above definition says that all conflicting operators in
Given a set of tasks
Consider the following example with two transactions
T1: R(A), W(A), R(B), W(B)
T2: R(A), W(A), R(B), W(B)
Let's say
T1:R(A), W(A)
is to incrementA
's value by 10.T1:R(B), W(B)
is to incrementB
's value by 10.T2:R(A), W(A)
is to doubleA
's value.T2:R(B), W(B)
is to doubleB
's value.
Note that there are 2! = 2 possible serial schedules.
- Serial schedule 1
T1 | T2 |
---|---|
R(A) | |
W(A) | |
R(B) | |
W(B) | |
R(A) | |
W(A) | |
R(B) | |
W(B) |
assuming the initial state is {A:10, B:10}
, the final state of the above is {A:40, B:40}
.
- Serial schedule 2
T1 | T2 |
---|---|
R(A) | |
W(A) | |
R(B) | |
W(B) | |
R(A) | |
W(A) | |
R(B) | |
W(B) |
assuming the initial state is {A:10, B:10}
, the final state of the above is {A:30, B:30}
.
Now let's consider the following schedule (let's call it schedule 3)
T1 | T2 |
---|---|
R(A) | |
W(A) | |
R(A) | |
W(A) | |
R(B) | |
W(B) | |
R(B) | |
W(B) |
Note that the above schedule is not conflict serializable, because
- conflicting operator pair
$T2:W(B) \prec T1:R(B)$ cannot be reordered to obtain serial schedule 1, and - conflicting pair
$T1:W(A) \prec T2:R(A)$ cannot be reordered to obtain serial schedule 2.
If we were to execute the above non-conflict serializable schedule with the init state {A:10, B:10}
, we get {A:40, B:30}
.
Recall that conflict serialiability is a conservative approximation of serialzability. Suppose we change the operation of T2
T2:R(A), W(A)
is to increaseA
's value by 20.T2:R(B), W(B)
is to increaseB
's value by 20.
Given the init state {A:10, B:10}
, serial schedule 1, serial schedule 2 and schedule 3 generate {A:30, B:30}
, which means that after the change of the operation of T2
, schedule 3 becomes serializable, but still not conflict serializable.
In the above example, we illustrate the Conflict Serializable check via a table and manual checking.
To implement the idea as an algorithm we could leverage on a directed graph, called precedence graph.
The precedence graph is generated from a schedule
- For each transaction
$t$ , create a node$n_t$ . - For each pair of transactions
$t$ and$t'$ , if there exists a conflicting pair of operators$t:c \prec t':c'$ in$X$ , we draw an directed arc from$n_t$ to$n_{t'}$ .
A schedule
For instance, the precendence graph of the schedule 3 in the previous example is as follows
graph
T1 --> T2
T1 --> T2
T2 --> T1
T2 --> T1
Generate schedule in a bruteforce way then check for conflict serializablity is still impractical. DBMSes use Two Phase Locking (2PL) to generate conflict serializable schedule.
Let's consider a strict version of two phase locking is called Strict 2PL. It is governed by the following two rules.
- When a transaction need to read an object
$A$ , it needs to acquire a shared lock$S(A)$ ; when it need to write an object$A$ , it needs to aquire an exclusive lock$X(A)$ . A shared lock can be shared by multiple reader transactions. An exclusive lock cannot be shared. - All locks are only released
$U(A)$ , when the transaction commit.
The requests for lock
Let's consider the earlier example with with two transactions
T1: R(A), W(A), R(B), W(B)
T2: R(A), W(A), R(B), W(B)
Suppose T1
starts slightly earlier than T2
. A schedule is generated with strict 2PL as follows
Timestep | T1 | T2 |
---|---|---|
1 | X(A) | |
2 | R(A) | |
3 | W(A) | |
4 | X(B) | |
5 | R(B) | |
6 | W(B) | |
7 | U(A), U(B), Commit | |
8 | X(A) | |
9 | R(A) | |
10 | W(A) | |
11 | X(B) | |
12 | R(B) | |
13 | W(B) | |
14 | U(A), U(B), Commit |
- At timestep 1,
T1
acquires the exclusive lock onA
since it needs to read and write to it.T2
is blocked as it would also need to acquire an exclusive lock onA
to start. - At timestep 4,
T1
acquires the exclusive lock onB
. - At timestep 7,
T1
is commited, hence both locks are released. - At timestep 8,
T2
acquires lock onA
and starts.
Once issue with Strict 2PL is that a transaction holds on to the locks it acquires until it commits, this blocks off other possible concurrent execution opportunity, hence throughput might be affected.
A variant of Strict 2PL, known as 2PL, is governed by the following two rules. (First rule is the same as Strict 2PL, the second rule is different).
- When a transaction needs to read an object
$A$ , it needs to acquire a shared lock$S(A)$ ; when it needs to write an object$A$ , it needs to aquire an exclusive lock$X(A)$ . A shared lock can be shared by multiple reader transactions. An exclusive lock cannot be shared. - A transaction releases the lock right after usage. A transaction cannot acquire new lock once it releases some lock.
The reason why this is called a two phase locking is that there is a phase of lock acquisition, the number of locks being acquired by a transaction grows until it starts to release some lock, then it will release all the locks gradually.
Given the same example, a schedule is generated with 2PL as follows
Timestep | T1 | T2 |
---|---|---|
1 | X(A) | |
2 | R(A) | |
3 | W(A) | |
4 | X(B) | |
5 | U(A) | |
6 | X(A) | |
7 | R(A) | |
8 | W(A) | |
9 | R(B) | |
10 | W(B) | |
11 | U(B), Commit | |
12 | X(B) | |
13 | U(A) | |
14 | R(B) | |
15 | W(B) | |
16 | U(B), Commit |
- At timestep 1,
T1
acquires an exclusive lock onA
and starts,T2
is blocked and waits. - At timestep 4,
T1
is done withA
, but needs to acquire an exclusive lock onB
before releasingA
. - At timestep 6,
T2
acuiqres an exclusive lock onA
and starts and runs until timestep 8, it needs to acquire an exclusive lock onB
but it is blocked. - At timesteps 9-10,
T1
operates onB
. - At timestep 11,
T1
releases lock onB
and is commited. - At timestep 12,
T2
is unblocked and acquires the exclusive lock onB
, it then releases lock onA
and continues with the rest of operations.
Note that schedules generated by both Strict 2PL and 2PL are conflict serializable. (Proof omitted.)
2PL in general yields a better throughput compared to Strict 2PL by allowing some transaction to start early. Such gain comes with a price. 2PL might produce schedules that are harder to recover due to cascading abort.
Consider the following schedule generated by 2PL, we keep the begin and commit explicit.
Timestep | T1 | T2 |
---|---|---|
1 | begin | |
2 | X(A) | |
3 | A=10 | |
4 | X(B) | |
5 | U(A) | |
6 | begin | |
7 | X(A) | |
8 | A=A+1 | |
9 | U(A) | |
10 | commit |
Assume the initial state is A=0
, executing the above schedule yields the following Undo/Redo Log
1. <T1 begin>
2. <T1 update A 0 10>
3. <T2 begin>
4. <T2 update A 10 11>
5. <T2 commit>
After timestep 11, the database crashes.
During the recovery phase, we start with undo phase, we find that T1 need to be undone and aborted.
However, T2's effect of A=A+1
is based on the assumption that A=10
in T1 was successful. As a result, T2 has be to aborted and undone too. This is known as cascading abort.
Another variant of Strict 2PL is called Conservative 2PL. It is same as Strick 2PL except that a transaction acquires all the locks it needs at the very begining.
For instance, given the previous example, a schedule is generated with conservative 2PL as follows
Timestep | T1 | T2 |
---|---|---|
1 | X(A),X(B) | |
2 | R(A) | |
3 | W(A) | |
4 | R(B) | |
5 | W(B) | |
6 | U(A), U(B), Commit | |
7 | X(A),X(B) | |
8 | R(A) | |
9 | W(A) | |
10 | X(B) | |
11 | R(B) | |
12 | W(B) | |
13 | U(A), U(B), Commit |
Note that all the locks are acquired by a transaction in an atomic operation, i.e. all or none. This is a way to prevent deadlock.
Let's consider how deadlock can be caused by 2PL (and Strict 2PL).
Consider the following transactions
T1: R(A), W(A), R(B), W(B)
T2: R(B), W(B), R(A), W(A)
and a schedule
Timestep | T1 | T2 |
---|---|---|
1 | X(A) | |
2 | R(A) | |
3 | W(A) | |
4 | X(B) | |
5 | R(B) | |
6 | W(B) | |
7 | try X(B) but blocked | try X(A) but blocked |
- At timestep 1,
T1
acquires an exclusive lock onA
and starts - At timestep 4,
T2
acquires an exclusive lock onB
and starts - AT timestep 7,
T1
tries to get a lock onB
but it gets blocked.T2
tries to get a lock onA
but it is blocked too. A deadlock arises.
A way to detect deadlock is that when transactions waiting for locks, the lock manager (part of DBMS) generate a wait-for graph.
In a wait for graph,
- each active transaction
$t$ is a node$n_t$ . - an arc going from
$n_t$ to$n_t'$ if transaction$t$ is waiting for a lock which is being held by$t'$ .
For insance, the wait-for graph from the previous example is
graph
T1 --> T2
T2 --> T1
To prevent deadlock, there are several approaches
- Use conservative 2PL, but throughput will be affected.
- Set priorities. Transactions with higher priorty get the lock. Transaction wth lower prioerty get aborted and retry.