Adding an external replica node to a cluster, but it keeps failing.
Hello,
I am using ClusterControl v2 to manage a Galera Cluster (MariaDB) deployed in a data center where each node has both external (public) and internal (private) IP addresses.
I am attempting to add a new external asynchronous replication node, with the database hosted in a separate data center. However, after completing the setup, the replication interface intermittently switches between the "Primary" (operational) state

and Replica (Failed) every few seconds...

Upon reviewing the settings on the slave node, I discovered that during the creation process (via the GUI), ClusterControl mistakenly selected the master's internal IP (10.118.0.2) instead of the external IP. As a result, the replication is failing, as the slave node cannot establish a connection to the master.

I attempted to stop the slave in order to update the MASTER_HOST but was unsuccessful.
Do you have any suggestions on how to resolve this issue?
Thank you!
George
-
Official comment
Hi,
When you deploy a Galera Cluster in ClusterControl, you have to specify the IP address or hostname for each node:

If you want to change the IP address after deploying it, you must modify the my.cnf configuration file to use the new IP address.
Can you please open a ticket with us here? https://support.severalnines.com
We can send you the instructions to change this there. Thank you.
Sebastian.
Comment actions -
If adding an external replica node to the cluster keeps failing, start by checking network connectivity between the nodes and ensure the required ports aren't blocked by a firewall. Verify that the replica is using the correct replication settings, including replica set name and host address, and that authentication credentials are valid with appropriate permissions. Make sure the replica and primary are running compatible versions, and that system clocks are synchronized to avoid time related issues. Finally, check the logs on both the primary and replica nodes for specific error messages, they often provide the exact cause of the failure.
-
ClusterControl likely auto selected the master’s internal IP. Remove the async slave from CC, then re-add it and manually choose the master’s external IP. If needed, stop jobs and fix it directly on the slave with:
STOP SLAVE; CHANGE MASTER TO MASTER_HOST='external.ip', ...; START SLAVE;. Also confirm firewall rules allow access on port 3306.
Please sign in to leave a comment.
Comments
3 comments