PostgreSQL transactions behind the scenes - MVCC, locks, isolation levels
Content
- MVCC
- Xmin and Xmax
- VACUUM command.
- Locks
- Transaction isolation levels
Used tools:
- PostgreSQL setup with Docker: https://hub.docker.com/_/postgres
- pgAdmin: https://www.pgadmin.org/
MVCC
PostgreSQL uses multiversion concurrency control (MVCC) architecture to implement transactions with proper isolation levels. MVCC makes concurrent access possible and safe by maintaining several versions of a row. Row is enriched with additional metadata which is xmin and xmax. Lets take a look at the example:
Xmin and xmax
I will insert a new row into the table:
We can do a standard select:
With followig result:
Nothing unexpected. But lets see the result of SELECT that explicitly mentions xmin and xmax columns:
These columns were not added by me when creating the table. They are handled by MVCC mechanism in PostgreSQL. As the official documentation states:
- xmin - The identity (transaction ID) of the inserting transaction for this row version. (A row version is an individual state of a row; each update of a row creates a new row version for the same logical row.)
- xmax - The identity (transaction ID) of the deleting transaction, or zero for an undeleted row version. It is possible for this column to be nonzero in a visible row version. That usually indicates that the deleting transaction hasn't committed yet, or that an attempted deletion was rolled back.
We can see that xmin is already initialised with id of transaction that was doing the initial SQL INSERT. The xmax is 0 since the row was not deleted by any transaction yet.
Lets do an update of the row:
Now check xmin and xmax again:
As you can see xmin was updated with an id of transaction that performed SQL UPDATE statement.
Concurrent transactions - writer and reader
OK, so we already know that MVCC uses xmin and xmax and we've seen how these columns are updated when SQL statements are performed consecutively. But the main strength and purpose of MVCC is to handle concurrent transactions by providing proper isolation levels. Lets then take a look at a little bit more complex scenarios.
Lets start with first transaction:
We use BEGIN instruction to start a new transaction. Then selecting its ID using special PostgreSQL txid_current function.
Lets see what is the ID:
We've received an ID, lets now check the result of INSERT followed by SELECT statement (inside the same transaction):
As we can see the xmin value is the same as ID of transaction that has inserted the row.
Before comitting all the changes lets take a look what will be the result of the same SELECT statement but performed inside a different transaction:
As you see a new row is not visible since the transaction that inserted it is not commited yet.
So lets COMMIT it now:
And perform SELECT again:
Now the row is visible since the transacion is already commited.
Concurrent transactions - two writers
Lets see what will happen in the scenario where two concurrent transactions are performing update statement on the same row.
We start with one employee in the table:
Now lets BEGIN first transaction (lets call it transaction A) with UPDATE statement:
We have increased employee's payment by 1000 - Carl is getting a raise.
Lets see what is the result of SELECT statement performed now by transaction A:
Nothing surprising - there is a new xmin value corrseponding to transaction A ID.
Now lets BEGIN transaction B (transaction A is not COMMITed yet).
We will start with SELECT statement first:
Lets see the result:
Employee payment has an old value (with old xmin) since transaction B has still an access to the previous row state - transaction A has not COMMITed yet. We can also see that xmax is set and has ID of transaction A. In this scenario it means that transaction A has UPDATEd this row but not COMMITed yet.
What is the purpose of the xmax value? Lets see what happens when transaction B tries to perform an UPDATE (giving Carl another raise!):
It gets blocked since there is another concurrent UPDATE happening which is not COMMITed yet.
Lets now COMMIT transaction A:
UPDATE in transaction B is now unblocked and can be completed. The UPDATE perfomed by Transaction B is being performed against the most recent row state since Transaction A is already COMMITed! After both transactions are finished lets now check employees table content:
There are two important things to notice:
- Payment value is correct (1000 + 1000 + 500 = 2500). It was possible since the second UPDATE was blocked until the first UPDATE is commited which means that the second UPDATE was using the most recent row state.
- Xmin value contains ID of Transaction B. Transaction B was the last one which has updated the row (which means deleted previous version and created a new one).
Vacuuming
MVCC with xmin and xmax column has its disadvantages:- As it was mentioned UPDATE or DELETE are in fact creating a new row. Previous version of a row is not physically deleted. It means that number of deleted rows will be growing and it will happen despite the fact that those rows might not be visible to transactions anymore.
- Xmin and Xmax values (transaction IDs) can not be incremented forever. They are 32 bit integers.
There is a special PostgreSQL command used to resolve both problems: VACUUM. PostgreSQL documentation definies it in following way:
VACUUM reclaims storage occupied by dead tuples. In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done. Therefore it's necessary to do VACUUM periodically, especially on frequently-updated tables.
Lets test it in practice. We have a following row in employees table:
Lets perform a VACUUM FULL command. FULL-mode reclaims more space than plain VACUUM:The employee row has following xmin and xmax values now:
As we can see xmax was removed. That was no longer needed since the transaction that has UPDATED/DELETED the row no longer exists.
- As it was mentioned UPDATE or DELETE are in fact creating a new row. Previous version of a row is not physically deleted. It means that number of deleted rows will be growing and it will happen despite the fact that those rows might not be visible to transactions anymore.
- Xmin and Xmax values (transaction IDs) can not be incremented forever. They are 32 bit integers.
There is a special PostgreSQL command used to resolve both problems: VACUUM. PostgreSQL documentation definies it in following way:
Locks
Locks together with MVCC row multiversion model are used in order to control transactions isolation. Locks can be acquired either automatically when specific SQL is performed or can be acquired manually. Manual locking is a way for applications to do custom isolation and concurrency control based on their logic. Both tables and individual rows can be locked. Here is a link to PostgreSQL documentation which contains a broader summary: https://www.postgresql.org/docs/current/explicit-locking.html
Lets take a look at locks in practice. I am going to BEGIN a transaction and perform an UPDATE statement on whole employees table:
And now invoke a command to retrieve lock data:
Result is following:
As we can see there is a RowExclusiveLock acquired. It is a popular lock acquired automatically by commands that modify data. It prevents other concurrent modifiactions.
COMMITting transaction releases all locks.
Please remember that SELECT statements do not automatically acquire locks in PostgreSQL. There are however commands to do it manually (for example SELECT FOR UPDATE).
When READ COMMITED is not enough - other isolation levels
Our previous experiments were run on READ COMMITED isolation level. While some of the cases work correct from data consistency point of view, others might not.
For example lets assume a following scenario:
BEGIN | BEGIN |
SELECT * FROM employees WHERE employeeid = 100; | |
SELECT * FROM employees WHERE employeeid = 100; | |
UPDATE employees SET payment = 4000 WHERE employeeid = 100; | |
COMMIT; | |
UPDATE employees SET payment = 5000 WHERE employeeid = 100; | |
COMMIT; |
So what is happening here? There are two transactions. Both are SELECTing the same employee row at the begginning. Second one is setting a new value for the payment and doing COMMIT. After that the first transaction is doing its own update setting a different value as a payment and then performs COMMIT. First transaction has overwritten UPDATE performed by the second one.
The anomaly above is called LOST UPDATE. There are three other 'popular' anomalies: DIRTY READ, NON-REPEATABLE READ and PHANTOM READ. We also have two more restrictive transaction isolation levels available which can solve consequent anomalies.
We can depict all the isolation levelsin following way:
Dirty read | Lost Update | Non repeatable read | Phantom read | |
Read commited | ✔ | ✕ | ✕ | ✕ |
Repeatable read | ✔ | ✔ | ✔ | ✕ |
Serializable | ✔ | ✔ | ✔ | ✔ |
It is worth to mention that PostgreSQL has 3 isolation levels while many other databases have 4. The additional one is Read Uncommited which allows all four anomalies mentioned in the table above. Due to MVCC transactions implementation it is not really possible to achieve this isolation level.
An obvious question would be why not to use Serializable all the time? The more restrictive the isolation level the worse is the performance of concurrent transactions. For instance Serializable requires transactions to be performed one after another (no concurrent actions).
Summary
We have taken a look at the MVCC mechanism and how it works. MVCC creates multiple versions of data item and flaggs them with additional metadata, for example xmin and xmax, which help to determine who was creating or modifing the row. It makes it possible for multiple transactions to work on the same data concurrently.
Its worth to remember about a VACUUM command which acts as a resolver of potential issues related to MVCC mechanism.
We have checked locking mechanism which can be acquired either automatically or manually if more customized concurrency controll is required.
At the end we summarised transaction izolation levels available in PostgreSQL.
Komentarze
Prześlij komentarz