Failed to create galera node for cluster
I have tried deploying a new Galera cluster on a STIGged RHEL 8.8 server, using at first the ClusterControl webapp, and then later trying to do the same using the s9s command line tool. I had a few errors along the way, usually having to do with permissions. And usually, an error message would give me some clue as to where the problem was. Now, I have gotten to the point where the job fails with the message 'Failed to create galera node for cluster', with no indication of why the failure occurred. I have tried looking for more info in logs, but I cannot find any additional info. If anything useful had been logged to the mariadb log, it gets automatically deleted by the job upon failure along with the other mariadb stuff. I would appreciate any tips on how I can locate the issue and fix it. Job log:
{ "command": "create_cluster", "job_data": { "cluster_name": "stigcluster", "cluster_type": "galera", "deploy_agents": true, "disable_firewall": true, "enable_uninstall": false, "install_software": true, "mysql_password": "", "mysql_user": "", "ssh_keyfile": "/root/.ssh/id_rsa", "ssh_user": "chrisp", "vendor": "mariadb", "version": "10.6", "nodes": [ { "hostname": "192.168.0.246", "protocol": "mysql" } ] } } 2025-05-17T00:45:53.000Z: Job is already closed. Failure creating galera cluster: Setting up one server failed: 192.168.0.246: Setup server failure, see the previous messages. 2025-05-17T00:45:53.000Z: Removed failed cluster 34 2025-05-17T00:45:53.000Z: Failed to purge cluster data on dcps db: Table 'dcps.clusters' doesn't exist: select id from dcps.clusters where cluster_id=34 2025-05-17T00:45:53.000Z: Cluster data has been purged from cmon db. 2025-05-17T00:45:53.000Z: Cluster is not active now, trying to cleanup 2025-05-17T00:45:53.000Z: Cluster 34 will be delete but not removing backups. 2025-05-17T00:45:53.000Z: Checking cluster. 2025-05-17T00:45:53.000Z: Failed to create galera node for cluster 34 2025-05-17T00:45:52.000Z: 192.168.0.246: Installing cronie. 2025-05-17T00:45:51.000Z: 192.168.0.246: Installing boost-program-options. 2025-05-17T00:45:49.000Z: Setting up mysql dependencies on RedHat/Centos 8. 2025-05-17T00:45:49.000Z: 192.168.0.246: Executing command 'systemctl restart dnf-makecache.timer' 2025-05-17T00:45:49.000Z: 192.168.0.246: Executing command 'echo 'max_parallel_downloads=10' >> /etc/dnf/dnf.conf' 2025-05-17T00:45:49.000Z: 192.168.0.246: Executing command 'echo 'metadata_timer_sync=3600' >> /etc/dnf/dnf.conf' 2025-05-17T00:45:49.000Z: 192.168.0.246: Confifure dnf-makecache.timer 2025-05-17T00:45:49.000Z: 192.168.0.246: Installing mariadb (galera) on redhat. 2025-05-17T00:45:49.000Z: 192.168.0.246:3306: MariaDB-server.x86_64 version to install from repository: 10.6.22 2025-05-17T00:45:45.000Z: 192.168.0.246:3306: Galera packages to install: MariaDB-server.x86_64 MariaDB-client.x86_64 galera-4 MariaDB-common.x86_64 MariaDB-backup 2025-05-17T00:45:45.000Z: 192.168.0.246:3306: Installing the MySQL packages. 2025-05-17T00:45:40.000Z: 192.168.0.246: OS user mysql already exists. 2025-05-17T00:45:40.000Z: 192.168.0.246: Checking OS user mysql. 2025-05-17T00:45:40.000Z: 192.168.0.246:3306: Prepare MySQL environment (user/group). 2025-05-17T00:45:40.000Z: MariaDB Server repository setup script executed on 192.168.0.246 2025-05-17T00:45:31.000Z: MariaDB repository setup script downloaded and verified on 192.168.0.246 2025-05-17T00:45:30.000Z: 192.168.0.246: Downloading MariaDB repository setup script (https://r.mariadb.com/downloads/mariadb_repo_setup). 2025-05-17T00:45:30.000Z: MariaDB repository setup handled with vendor's script. More info: https://mariadb.com/kb/en/mariadb-package-repository-setup-and-usage/ 2025-05-17T00:45:30.000Z: 192.168.0.246: Using external repositories. 2025-05-17T00:45:29.000Z: 192.168.0.246:3306: Total disk space: 5.39 GiB. 2025-05-17T00:45:29.000Z: 192.168.0.246:3306: Free disk space: 4.96 GiB. 2025-05-17T00:45:29.000Z: 192.168.0.246:3306: Checking free-disk space of '/var/lib/mysql'. 2025-05-17T00:45:29.000Z: 192.168.0.246:3306: Detected memory: 7.77 GiB. 2025-05-17T00:45:29.000Z: 192.168.0.246: Detected memory: 7952MB. 2025-05-17T00:45:29.000Z: 192.168.0.246:3306: Detecting total memory. 2025-05-17T00:45:29.000Z: Loaded template '/usr/share/cmon/templates/my.cnf.mdb106+-galera'. 2025-05-17T00:45:29.000Z: Using config template '/usr/share/cmon/templates/my.cnf.mdb106+-galera'. 2025-05-17T00:45:29.000Z: 192.168.0.246:3306: Using template 'my.cnf.mdb106+-galera'. 2025-05-17T00:45:29.000Z: Cluster type is 'galera'. 2025-05-17T00:45:29.000Z: Using skip_name_resolve as all servers are defined by IP. 2025-05-17T00:45:29.000Z: Auto-generating MySQL root password. 2025-05-17T00:45:29.000Z: 192.168.0.246: Verifying helper packages (checking 'socat'). 2025-05-17T00:45:29.000Z: 192.168.0.246: Finished with helper packages. 2025-05-17T00:45:27.000Z: 192.168.0.246: Upgrading openssl openssl-libs. 2025-05-17T00:45:26.000Z: 192.168.0.246: Installing 'dnf-command(config-manager)'. 2025-05-17T00:45:25.000Z: 192.168.0.246: Installing openssh. 2025-05-17T00:45:23.000Z: 192.168.0.246: Installing openssl. 2025-05-17T00:45:22.000Z: 192.168.0.246: Installing tar. 2025-05-17T00:45:21.000Z: 192.168.0.246: Installing bzip2. 2025-05-17T00:45:19.000Z: 192.168.0.246: Installing pigz. 2025-05-17T00:45:18.000Z: 192.168.0.246: Installing gnupg2. 2025-05-17T00:45:17.000Z: 192.168.0.246: Installing curl. 2025-05-17T00:45:15.000Z: 192.168.0.246: Installing wget. 2025-05-17T00:45:14.000Z: 192.168.0.246: Installing libevent. 2025-05-17T00:45:13.000Z: 192.168.0.246: Installing libaio. 2025-05-17T00:45:11.000Z: 192.168.0.246: Installing rsync. 2025-05-17T00:45:10.000Z: 192.168.0.246: Installing psmisc. 2025-05-17T00:45:09.000Z: 192.168.0.246: Installing which. 2025-05-17T00:45:07.000Z: 192.168.0.246: Installing perl-Data-Dumper. 2025-05-17T00:45:06.000Z: 192.168.0.246: Installing socat. 2025-05-17T00:45:05.000Z: 192.168.0.246: Installing net-tools. 2025-05-17T00:45:04.000Z: 192.168.0.246: Installing http://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm. 2025-05-17T00:45:02.000Z: 192.168.0.246: Upgrading ca-certificates. 2025-05-17T00:45:01.000Z: 192.168.0.246: Upgrading nss. 2025-05-17T00:45:01.000Z: 192.168.0.246: Installing helper packages. 2025-05-17T00:45:00.000Z: 192.168.0.246: Setting vm.swappiness = 1. 2025-05-17T00:45:00.000Z: 192.168.0.246: Tuning OS parameters. 2025-05-17T00:45:00.000Z: 192.168.0.246: Flushing iptables. 2025-05-17T00:44:56.000Z: 192.168.0.246: Disabling firewalld. 2025-05-17T00:44:54.000Z: 192.168.0.246: Checking firewall. 2025-05-17T00:44:54.000Z: 192.168.0.246: Checking SELinux status (enabled = false). 2025-05-17T00:44:53.000Z: 192.168.0.246: Checking and disabling AppArmor. 2025-05-17T00:44:53.000Z: 192.168.0.246: Type is 'redhat'. 2025-05-17T00:44:53.000Z: 192.168.0.246: Release is '8'. 2025-05-17T00:44:53.000Z: 192.168.0.246: Vendor is 'rhel'. 2025-05-17T00:44:53.000Z: Calling job: setupServer(192.168.0.246:3306). 2025-05-17T00:44:53.000Z: wsrep_cluster_address = 'gcomm://' 2025-05-17T00:44:53.000Z: Creating the cluster with the following: 2025-05-17T00:44:53.000Z: Allocated cluster id 34. 2025-05-17T00:44:53.000Z: 192.168.0.246: FIPS mode is enabled. 2025-05-17T00:44:51.000Z: 192.168.0.246: Checking OS information. 2025-05-17T00:44:49.000Z: 192.168.0.246:3306: Checking if host already exists in another cluster. 2025-05-17T00:44:49.000Z: 192.168.0.246: Access with ssh/sudo granted. 2025-05-17T00:44:49.000Z: 192.168.0.246: Checking ssh/sudo with credentials ssh_cred_job_6656. 2025-05-17T00:44:48.000Z: Verifying job parameters. 2025-05-17T00:44:48.000Z: Cluster will be created on 1 data node(s). 2025-05-17T00:44:48.000Z: CMON version 2.3.1.12052.
-
Official comment
It's hard to determine what cause the issue indeed. I see it stops when it tries to install the cron package. I bump similar issue like this but I have not seen it since which I presume probably a bug that has just been fixed. We are already at 2.3.2 version (check our changelog page).
Other than that, you can check also the `/var/log/messages` in your db nodes and also check `/var/log/mysql/mysqld.log` if any information you can check why it fails.
If you need further help from us, you may raise a Zendesk ticket so we can dig further if this is probably a bug. However, I suggest you make sure you have upgraded your version to the most updated version available of ClusterControl before raising the issue to us via Zendesk to verify this issue is existent in the current and most updated version we have available.
Thanks.
Comment actions -
Paul, thank you very much for your input. I installed the newest at the time a few months ago, which was 2.3.1, so I will go ahead and update to 2.3.2 and try again. I won't be able to check mariadb logs as the ClusterControl deployer installs mariadb app and logs into the same place and then removes them all when the cluster fails to deploy. Would be nice if there was some option that could be toggled to NOT remove everything on deploy failure, for troubleshooting purposes. At any rate, I will upgrade as you suggested, and if the problem still exists, I will open a Zendesk ticket.
Please sign in to leave a comment.
Comments
3 comments