Constant hanging of wsrep in pre-commit stage
What does this stage mean exactly? Why are there dozens of these threads going? It gets to the point that no one can connect to a server because connections are used up. Any assistance is appreciated
-
Connections hanging in the pre-commit state are a result of slow writeset certification process on the remaining nodes in the cluster. Certification has to happen before the commit, thus pre-commit stage. There's really no one-size-fit-all solution for this problem as the culprit may be different in different cases. There are couple of things, though, that you may try. For starters, if you see that the CPU utilization is within limits and I/O looks fine, you may try to increase the number of the workers (wsrep_slave_threads). This may help to utilize resources more fully and increase the writeset certification speed. Keep in mind that there's no point in increasing it more than 'wsrep_cert_deps_distance' status counter. Also, if you have wsrep_slave_threads set to more than twice the CPU cores, it's unlikely that further increase will help.
Another thing you may try is to set gcs.fc_limit to some higher value. This setting decides on the size of the write queue that causes the writeset replication to stop. The idea here is that setting this variable higher may help to accommodate some temporary spikes in number of DML's, that'd otherwise caused the writeset replication to pause.
If the problem is caused by too many connections running queries in the same time or simply by lack of resources, there's not much you can do. What can be done is to implement some kind of connection pooling and limit number of connections to the database.
As I said, it's kinda tricky to tell exactly how to solve this problem without detailed knowledge of the workload but I hope this will point you to the right direction for further investigation.
Please sign in to leave a comment.
Comments
2 comments