Cant add node to Galera Cluster through the UI
Hi there,
I need some help on how to add a new node to a Galera Cluster using the Cluster Control panel.
A couple of days ago I was able to use the feature, but I can't use it anymore.
The first job "Add New Node to Cluster" finishes with the last message: Successfully added node.
But the next job "Galera Node Recovery" keeps running forever showing the following messages:
- [10:14:36]: NEWNODE_IP:3306: node state is GALERA_NODE_UNKNOWN, expected state GALERA_NODE_SYNCED - waiting!
- [10:14:36]: NEWNODE_IP:3306: No privileges to connect to the node (errno: 1130, Host '<CLUSTERCONTROL_IP>' is not allowed to connect to this MySQL server)
- [10:14:36]: NEWNODE_IP: Error getting status variable wsrep_local_state: Host '<CLUSTERCONTROL_IP> is not allowed to connect to this MySQL server
I can register the node manually by provisioning the service on the node, starting it and then adding on the UI, but ClusterControl can't provision and add a new node from the UI.
All nodes can reach each other on the 3306 port.
I've checked the mysql config file and it looks fine.
I can see the following message on the new node logs:
- 2017-11-01 13:57:46 19954 [Warning] IP address '<CLUSTERCONTROL_IP>' could not be resolved: Name or service not known
- 2017-11-01 14:01:49 19954 [Warning] IP address '<OLDNODE_IP>' could not be resolved: Name or service not known
Any suggestion on how to fix this problem, please?
Thanks,
Francisco Andrade
-
Official comment
Hi,
Is port 4567 and 4568 also open (required by galera)Also, is the /etc/hosts file on the:
OLDNODE_IP
CLUSTERCONTROL_IP
NEWNODE_IP
Are the other nodes using hostnames or IPs? Can you check if they have skip_name_resolve?Also what version of clustercontrol are you using? Please run on the controller:
cmon --version
Best regardsJohan
Comment actions -
Hi Johan,
Here's the version command output:
cmon, version 1.4.2.2189Ports 4567 and 4568 are open on the security group, but the new node doesn't seem to be using them:
- @new-node$ cat < /dev/tcp/localhost/4567
-bash: connect: Connection refused
- @old-node$ cat < /dev/tcp/localhost/4567
$??ک#??o?o?RP??/?8e??V?:*,i_
I'm not mapping the servers on /etc/hosts, but the other 2 nodes on the clusters are working fine without mapping.
I've included the line "skip_name_resolve=OFF" on my template and now I don't see the "could not be resolved" message on the logs. But the problem persists.MySQL service is running on the new node, but something's wrong.
Running SHOW VARIABLES LIKE 'wsrep_%'; shows some useful infos like:
- wsrep_cluster_address = lists all the 3 nodes
- wsrep_cluster_name = is correctRunning SHOW GLOBAL STATUS LIKE 'wsrep_%'; shows more:
- wsrep_cluster_size = 0
- wsrep_cluster_state_uuid = <null>
- wsrep_cluster_status = Disconnected
- wsrep_connected = OFF
- wsrep_ready = ONPS: While writing this message I've found something really useful on the node logs. The problem seems to be a bug on base CentOS 7.3 version. I've run with this error before and solved by upgrading to CentOS 7.4. I haven't noticed that the new node was using the older version. I should have noticed it before.
The log messages are:- 2017-11-01 14:54:54 20126 [ERROR] WSREP: wsrep_load(): dlopen(): /usr/lib64/libgalera_smm.so: symbol SSL_COMP_free_compression_methods, version libssl.so.10 not defined in file libssl.so.10 with link time reference
- 2017-11-01 14:54:54 20126 [ERROR] WSREP: wsrep_load(/usr/lib64/libgalera_smm.so) failed: Invalid argument (22). Reverting to no provider.
I'll launch a new node with newer OS and try to add again.
- @new-node$ cat < /dev/tcp/localhost/4567
-
Hi Johan,
Everything is working fine now that I've upgraded to CentOS 7.4 image.
I have one last question regarding adding nodes using Cluster Control, I'm currently setting a hardcoded value to wsrep_cluster_name parameter on my template, but I was wondering if there's any dynamic variable like:
- wsrep_cluster_address="@WSREP_CLUSTER_ADDRESS@"
- wsrep_node_address=@HOST@Is there a @WSREP_CLUSTER_NAME@ equivalent?
I've tried using this one, but the value wasn't replaced during the provisioning process.Regards,
Francisco Andrade
Please sign in to leave a comment.
Comments
6 comments