Importing Cluster - Job Failed - Wrong node list
Hi guys,
I'm trying to import an existing Galera Cluster into Cluster Control, but the job is failing on the stage where it gets the list of nodes on the cluster.
The process is getting the wrong node name/ip as we can see on the logs:
- [12:27:48]: Adding existing MySQL cluster.
- [12:27:48]: 10.0.1.4: Checking ssh/sudo.
- [12:27:49]: 10.0.1.4: Access with ssh/sudo granted.
- [12:27:49]: 10.0.1.4:3306: Verifying the MySQL user/password.
- [12:40:20]: 10.0.1.4:3306: Getting node list from the MySQL server.
- [12:40:20]: /etc/profile.d/motd.sh: Checking wsrep_node_address.
- [12:40:20]: /etc/profile.d/motd.sh: Couldn't get wsrep_node_address status variable: ''
- [12:40:20]: Found node: '/etc/profile.d/motd.sh'
- [12:40:20]: Found in total 1 nodes.
- [12:40:20]: Checking that nodes are not in another cluster.
- [12:40:20]: /etc/profile.d/motd.sh: Checking ssh/sudo.
- [12:40:20]: /etc/profile.d/motd.sh: Libssh connect error: Failed to resolve hostname /etc/profile.d/motd.sh (Name or service not known).
This motd.sh file is an ssh welcome script.
Logged on the cluster, I can't find the value on any status variable:
...
| wsrep_incoming_addresses | 10.0.1.4:3306 |
| wsrep_desync_count | 0 |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0/0/0/0/0 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_gcomm_uuid | 7ed839b7-d383-11e7-83d1-5f0879bdd20a |
| wsrep_cluster_conf_id | 1 |
| wsrep_cluster_size | 1 |
| wsrep_cluster_state_uuid | 7ed88466-d383-11e7-a904-6b68dd0dae89 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 0 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 3.21(r8678538) |
| wsrep_ready | ON |
+------------------------------+--------------------------------------+
...
my.cnf file have:
wsrep_node_address=10.0.1.4
wsrep_node_name=percona-node-1511794714
My /etc/hosts file looks ok:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
If I remove the script from the server everything works fine, but I lose my welcome message.
Any suggestion on how to debug this?
Thanks in advance,
Francisco Andrade
-
Official comment
Hi Franscisco,
Thanks for reporting this. The OS user must be configured with proper PATH environment variable, as expected by any root/sudo user. The below should be expected:
PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/user/.local/bin:/home/user/bin
Details at https://severalnines.com/docs/requirements.html#operating-system-user
Regards,
Ashraf
Comment actions -
I've just had some help debugging the problem and discovered that the problem is that the process executes the motd.sh script, that is failing when cluster control runs it but does not fail when is manually run.
It's failing because the ssh session doesn't have the user's environment variables.
So it tries to run a shell command and fails when it doesn't find the command.
When I add the full command path the problem is solved.
Thanks,
Francisco Andrade
Please sign in to leave a comment.
Comments
2 comments