Separating Data nodes over WAN
Albeit possible (new timeout handling was introduced in 7.2 to support separating data nodes within one cluster over geographical distance), it is nothing we recommend unless there is close to LAN characteristics between the two sites, or you are aware of that writes will be slow. Why and sort of how to reason about it is written below.
The reasons we typically don't recommend separation of data nodes are:
- Writes will be slow due to the Two Phase Commit (2PC) protocol (some simplified theory below).
- Recovery of a failed node will be slower than on a LAN
Normally, ping times between machines are around 80-120us on a Gig-E LAN (servers connected to the same switch).
Somewhat simplified:
When performing a write on the cluster a Two Phase Commit protocol is involved. If you have node A on site 1 and node B on site 2, let's call them A1 and B2.
When a transaction is started, lets pretend that the Transaction coordinator on A1 will handle the transaction. As a part of the 2PC protocol node A1 will send a PREPARE message to node B2 (one network hop) . Node B2 will then send an PREPARE_OK back to the TC on node A1 (another Network hop).
Next is the COMMIT phase, TC on A1 will send a COMMIT to B2 (network hop), B2 will send a COMMIT_OK to A1.
In total, 4 network hops for the writes.
If the latency is 80us, then a write will take 4x80us + some extra time (much smaller than the multiple) to update the memory structures = 320us (now the data nodes also employs some very sophisticated batching so if you many writes at the same time you will not pay the full price for each individual write.
Now if the data nodes are geo separated, and say the ping time between the sites are 60ms, then you will end up with:
4x60ms = 240ms = 240 000 us , and this is usually too slow for most use cases.
We hope this gives you ideas for what you can expect if you separate the data nodes. You can give it a shot if you think you have a fast low latency backbone between the data nodes.
If not, what should you do?
Perhaps you can look at :
- Using asynchronous replication with conflict detection and resolution
- Use Galera instead. Writes will be slow here too but not that slow since Galera does not use 2PC, so less messages between the geo separated nodes.
Please sign in to leave a comment.
Comments
0 comments