MySQL Galera node permanent in state "Disconnected/Down" - any way to reset it in cmon "GUI"?
3 node cluster, 1 HaProxy.
We had a few incidents yesterday, where node1 was crashed,
When restarted, nodes 2 AND 3 decided to crash.
Node 2 MariaDB/MySQL could not be restarted ("Signal 11" immediately after starting Recovery)
We ended up having to restore node 2, by removing MySQL and datafiles, reinstalling MySQL and restoring configfiles.
After restart, it did SST and started working again.
However, this rendered ClusterControl unable to see the node. (OS)
Galera cluster is working as it should, when checking manually in MySQL/MariaDB, only problem is that cmon lost track of this one node.
Nothing was changed regarding Linux user, all MySQL users are recreated as on node 1/3
cmon GUI can show everything DB related (Communityversion-wise)
But node keeps being "orange" in status "Disconnected/Down"
It got a little better after restarting cmon and apache2 on the clustercontrol linux server. But I am lost here now.
I hope someone here has a suggestion, that does not include taking down cluster or losing valuable performance history in cmon ;)
-
Official comment
Hi,
The best way to figure out what is going on in your case would be to open a ticket and attach there an error report generated as described here:
http://support.severalnines.com/hc/en-us/articles/212425883
It contains some data which should not be made public therefore our ticket system will be a better place.
Thanks,
KrzysztofComment actions -
I am still only a freeloader with Community Edition. Would I be allowed to create a ticket?
However, when trying to run the "s9s_error_reporter -i [clusterID]" , I am getting errors from the "selects" in your script.
((Warning) using a password on the command line interface can be insecure.
ERROR 1045 (28000) Access denied for user 'cmon'@'localhost' (using password: YES)However, if I had been able to create the log, I would have a *very* hard time getting the file out from our closed network to you.
Can you suggest where our process with regard to ClusterControl might have failed.
We had EVEYTHING working, all 3 nodes green, Controller green.
Then as explained, one node (node2) failed permanently to start up Mysql and we ended up removing Mysql, data and reinstalling Mysql same version (With running Controller)
The Galera did a successful SST and things are working again in the Mysql/Galera part of our setup. Only ClusterControl fails to see that picture. It sees the restored db on the node, and OS perf, but it insists that node is down/disconnected.
The repaired node (2) is now id:5 in cmon.server_node
We can see some differences in that same place (cmon.server_node.properties) that the now node id 5 (previous node id 3) have some things missing :
certsdepsdistance, clustersize, flow%, galerastatus, local% are all totally missing from "node2".
"Message": ""; (the other two are "Node is Synced")
"mysqlstatus":1 (the other two are 0)
and Status=1 where the other working are 0
I hope you can see what we might have done wrong, or what we can try to tilt this ClusterControl into seeing that there's nothing wrong with "node2" ?
Regards,
Brian
Please sign in to leave a comment.
Comments
3 comments