This article to describes how to add an existing cluster into ClusterControl.
Supported clusters are:
- Galera Clusters
- Collection of MySQL Servers
- Mongodb/Tokumx
Assumptions:
- Dedicated server for CMON Controller and UI. Do not colocate with Cluster
- Use the same OS for the Controller/UI server as for the Cluster. Do not mix. It is un-tested.
- We assume in this guide that you install as root. Read more on server requirements.
Outline of steps:
- Setup SSH to the cluster nodes
- Preparations of Monitored MySQL Servers (part of the cluster)
- Installation
- Add the existing cluster
Limitations:
- All nodes must be started and able to connect to. For Galera, all nodes must be SYNCED, when adding the existing cluster
- If controller is deployed on an IP address (hostname=<ipaddr> in /etc/cmon.cnf) and MySQL/Galera server nodes are deployed without skip_name_resolve then GRANTs may fail. http://54.248.83.112/bugzilla/show_bug.cgi?id=141
- GALERA: If wsrep_cluster_address contains a garbd, then the installation may fail.
- GALERA: Auto-detection of the Galera nodes is based on the wsrep_cluster_address. Atleast one Galera node must have wsrep_cluster_address=S1,..,Sn set, where S1 to Sn denotes the Galera nodes part of the Galera cluster.
Setup SSH from Controller -> Cluster, and Controller -> Controller
On the controller do, as user 'root':
ssh-keygen -trsa
(press enter on all questions)
For each cluster node do:
ssh-copy-id root@<cluster_node_ip>
On the controller do (controller needs to be able to ssh to itself):
ssh-copy-id root@<controller_ip>
SSH as a Non-root user
For now you need to setup passwordless sudo.
See Read more on server requirements.
Monitored MySQL Servers - GRANTs
This section applies to "Galera clusters" and "Collection of MySQL Servers".
The mysql 'root' user on the monitored MySQL servers must have the WITH GRANT OPTION specified.
On the monitored mysql servers you need to be able to connect like this
mysql -uroot -p<root password> -h127.0.0.1 -P3306 #change port from 3306 to the MySQL port you use.
mysql -root -p<root password> -P3306 #change port from 3306 to the MySQL port you use.
and NOT LIKE THIS:
mysql -uroot <--- INVALID EXAMPLE
If you cannot connect to the mysql server(s) like the first example above you need to do:
mysql> GRANT ALL ON *.* TO 'root'@'127.0.0.1' IDENTIFIED BY '<root password>' WITH GRANT OPTION;
mysql> GRANT ALL ON *.* TO 'root'@'localhost' IDENTIFIED BY '<root password>' WITH GRANT OPTION;
Installation
Follow the instructions below. It will install a MySQL Server as well (if none is installed already) to host the monitoring and management data, the UI, and the CMON Controller with a minimal configuration.
$ wget http://severalnines.com/downloads/cmon/install-cc.sh
$ chmod +x install-cc.sh
$ sudo ./install-cc.sh
A longer version of the instructions are located here: http://support.severalnines.com/entries/23737976-ClusterControl-Installation
Add Existing Cluster
In the UI, press the button Add Existing Cluster.
- All cluster nodes must have the same OS
- All cluster nodes must be installed in the same place
- All cluster nodes must listen on the same mysql port
In the screenshot below you see an example, adding a Galera Cluster with one node running on 10.0.1.10 (the rest of the galera nodes are auto-detected).
The Galera node on 10.0.1.10 must have the wsrep_cluster_address set, and to verify that you can do (open a terminal on 10.0.1.10 first):
mysql -uroot -h127.0.0.1 -p
show global variables like 'wsrep_cluster_address';
mysql> show global variables like 'wsrep_cluster_address';
+-----------------------+------------------------------------------------------------+
| Variable_name | Value |
+-----------------------+------------------------------------------------------------+
| wsrep_cluster_address | gcomm://10.0.1.10,10.0.1.11,10.0.1.12 |
+-----------------------+------------------------------------------------------------+
1 row in set (0.01 sec)
Click "Add Cluster" and after a while the Cluster will appear, or else a "Job failed" will be shown on the screen and you can investigate what the problem is.
Comments
71 comments
Hi,
Can someone help me please i'm new to this cluster control setup. i run the apt-get install mariadb-client mariadb-galera-server-5.5 percona-toolkit percona-xtrabackup manually then its works fine but as soon as i run threw the cluster control its fails i think its overrides my source.list file on ubuntu 12.04 LTS
Here is my error message:
34: Setting up the first server failed, aborting
33: clusterwb1: Setup server failure, see the previous msgs.
32: clusterwb1: failed apt-get install mariadb-client mariadb-galera-server-5.5 percona-toolkit percona-xtrabackup exited with 100: E: Unable to correct problems, you have held broken packages.
31: clusterwb1: Installing MariaDB-5.5 debian
30: clusterwb1: Using External repositories.
29: clusterwb1: Installing the MySQL packages.
28: clusterwb1: Prepare MySQL environment (user/group)
27: clusterwb1: Removing old MySQL packages from the host.
26: clusterwb1: Detected free disk space: 55701 MB
25: clusterwb1: Checking free-disk space of /var/lib/mysql
24: clusterwb1: Detected memory: 1467 MB
23: clusterwb1: Detecting total memory.
22: clusterwb1: Detected CPU cores: 2
21: clusterwb1: Detecting number of CPU cores/threads.
20: Verifying helper packages (checking if 'socat' is installed successfully).
19: clusterwb1: Installation report of helper packages: ok: psmisc ok: rsync ok: libaio1 ok: netcat ok: netcat-openbsd ok: socat ok: lsb-release ok: libssl0.9.8 ok: libssl1.0.0 ok: libdbd-mysql-perl ok: wget ok: curl ok: pigz
18: clusterwb1: Installing helper packages.
17: clusterwb1: Setting vm.swappiness = 1
16: clusterwb1: Tuning OS parameters.
15: clusterwb1: Keeping existing firewall settings.
14: clusterwb1: Detecting OS
13: clusterwb1: Checking SELinux status.
12: Using sudo_password.
11: clusterwb1: Verifying sudo on the server.
10: Verifying the SSH connection to clusterwb1.
9: clusterwb1: Checking if host is already exist in other cluster.
8: Checking job parameters.
7: create_cluster: calling job: setupServer(clusterwb1).
6: Testing SSH to 192.168.11.38.
5: Testing SSH to clusterwb3.
4: Testing SSH to clusterwb2.
3: Testing SSH to clusterwb1.
2: Verifying job parameters.
1: Message sent to controller
HI,
What load balancer is it?
BR
johan
Hi,
Its working when i setup MariaDB manually and then add to "Add existing server/cluster" from CC UI.
Just quick small question i already got Loadbalancer setup and my plan initially is to use MariaDB Galera with 3 nodes and cluster control. so how do i add the other nodes to use MariaDB Galera clustering do i just click add node.
Thanks
Chris
Hi,
I got the same output error (Unable to correct problems, you have held broken packages) what you suggested. on my previous question should i setup complete clustering and then add one manually when I'm done. why i ask is that i got a limitation on testing hardware.
Thanks
Chris
Hi,
Once you setup this single node and then add it as "Add existing server/cluster" a new cluster will show up in CC UI. You can enter this cluster and from within do "Add node". You'll have two options - you can either provision it from scratch or add existing one, when node is a part of the cluster. I'd suggest to try to setup a new one. If you'll see the similar problem as with the first one, please open a ticket with us. While it's possible to setup everything manually and then add those nodes to the CC, it's not how we'd like it to work. If you can't provision a node from scratch, it's either a problem in your particular setup or a bug in CC. We'd like to learn what's going on no matter what's the culprit - we'd like to try to fix a bug or find a workaround, whatever is causing problems..
Thanks,
Krzysztof
Hi,
Its very strange to me ass well but ill setup one node manually then can i add the rest "Add existing server/cluster" from CC UI or should i setup all of them
Thanks,
Chris
Hi,
This error is expected - if you asked CC to provision a host where MySQL is already running, it will stop the process in case it's a mistake - we don't want to cause any harm.
I'm not sure what exactly happened here, though - have you installed the MariaDB node manually? Can it be stopped? If yes, you can try two things. First - you can stop the MariaDB and then restart the job to create a new cluster from CC UI. If it fails with the previous error (Unable to correct problems, you have held broken packages), you can try to setup MariaDB node manually and then do "Add existing server/cluster" from CC UI.
If the problem persists, I'd suggest to open a ticket with us and we can try to look at what's going on using some kind of screenshare session (Teamviewer or Join.me).
Thanks,
Krzysztof
Krzysztof,
I get no output on my testing server nothing is holding the package i tried to install then run it again and hold or unhold package gives me this error
456 - Message sent to controller
457 - Verifying job parameters.
458 - Testing SSH to clusterwb1.
459 - Testing SSH to clusterwb2.
460 - Testing SSH to clusterwb3.
461 - Testing SSH to 192.168.11.38.
462 - create_cluster: calling job: setupServer(clusterwb1).
463 - Checking job parameters.
464 - clusterwb1: Checking if host is already exist in other cluster.
465 - Verifying the SSH connection to clusterwb1.
466 - clusterwb1: Verifying sudo on the server.
467 - Using sudo_password.
468 - clusterwb1: Checking SELinux status.
469 - clusterwb1: Detecting OS
470 - clusterwb1: There is a mysqld server running. It must be uninstalled first, or you can also add it to ClusterControl.
471 - Setting up the first server failed, aborting
Thanks,
Chris
Hi,
Can you please paste here the output of:
dpkg --get-selections | grep hold
Thanks,
Krzysztof
Hi,
I use haproxy for my webserver i saw on the UI you can add existing loadbalancer will it work?
Thanks,
Chris.
Cindy,
I'm not talking about ssh. The problem, if exists, is with regards to MySQL connectivity. Can you run "telnet 10.40.191.171 3306" from 10.40.123.67 host?
Thanks,
Krzysztof
Krzysztof,
Please see attached screenshot of connecting to 171 from controller (.67) using cmon user.
Thanks,
Cindy
Cindy,
This error:
124: Connection failed from 10.40.123.67 to 10.40.191.171: can't determine the datadir on '10.40.191.171:3306' connect, failed for user 'cmon': Can't connect to MySQL server on '10.40.191.171' (4) (errno: 2003)
is related to the MySQL connectivity. Are you sure that you can connect from 10.40.123.67 to 10.40.191.171 on port 3306? Error seems to indicate it's not possible.
Thanks,
Krzysztof
I have already run this command.
MariaDB [(none)]> GRANT ALL ON*.*TO 'cmon'@'10.40.123.67' IDENTIFIED BY '<password>' WITH GRANT OPTION;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> flush privileges;
Query OK, 0 rows affected (0.01 sec)
i Krzysztof,
Here it is. I can successfully ssh 10.40.123.67 to 10.40.191.171 using the cmon user. datadir also defined in the config file. Do I need outbound ssh connection from cluster nodes to controller? (vice versa ssh connection)
124: Connection failed from 10.40.123.67 to 10.40.191.171: can't determine the datadir on '10.40.191.171:3306' connect, failed for user 'cmon': Can't connect to MySQL server on '10.40.191.171' (4) (errno: 2003)
123: Checking connectivity and determining the datadir on the MySQL nodes.
122: Detected OS = 'redhat'
121: Detecting the OS.
120: Node is Synced : 10.40.192.75
119: Node is Synced : 10.40.193.69
118: Node is Synced : 10.40.191.171
117: Granting the controller on the cluster.
116: Detected that skip_name_resolve is not used on the target server(s).
115: Check SELinux statuses
114: Verifying the SSH connection to the nodes.
113: Checking the nodes that those aren't in another cluster.
112: Found in total 3 nodes.
111: Found node: '10.40.192.75'
110: Found node: '10.40.193.69'
109: Found node: '10.40.191.171'
108: Getting node list from the MySQL server.
107: monitored_mysql_root_password is not set, please set it later the generated cmon.cnf
106: Verifying the MySQL user/password.
Thanks,
Cindy
Cindy,
Can you please check the details of the Admin -> Cluster Jobs -> Failed job? They may give some more insight into what happened than the messages in the progress window (and by looking at what you pasted it looks to me like those are logs from the progress window). It's especially true when there are issues with credentials.
Thanks,
Krzysztof
Hi,
I restarted testing this application on different test environment. After confirming all prerequisites are properly working, I am now adding the existing cluster. I created separate mysql user instead of root. Problem is the job is failing without detailed reason. What is the possible caused of this?
Using cmon user with proper grants.
54 - Message sent to controller
55 - Verifying controller host and cmon password.
56 - Verifying the SSH access to the controller.
57 - Verifying job parameters.
58 - Verifying the SSH connection to 10.40.191.171.
59 - Verifying the MySQL user/password.
60 - monitored_mysql_root_password is not set, please set it later the generated cmon.cnf
61 - Getting node list from the MySQL server.
62 - Found node: '10.40.191.171'
63 - Found node: '10.40.193.69'
64 - Found node: '10.40.192.75'
65 - Found in total 3 nodes.
66 - Checking the nodes that those aren't in another cluster.
67 - Verifying the SSH connection to the nodes.
68 - Check SELinux statuses
69 - Detected that skip_name_resolve is not used on the target server(s).
70 - Granting the controller on the cluster.
71 - Node is Synced : 10.40.191.171
72 - Node is Synced : 10.40.193.69
73 - Node is Synced : 10.40.192.75
Job failed.
And this result when I use root mysql user.
**
78 - Message sent to controller
79 - Verifying controller host and cmon password.
80 - Verifying the SSH access to the controller.
81 - Verifying job parameters.
82 - Verifying the SSH connection to 10.40.191.171.
83 - Verifying the MySQL user/password.
84 - Getting node list from the MySQL server.
85 - Found node: '10.40.191.171'
86 - Found node: '10.40.193.69'
87 - Found node: '10.40.192.75'
88 - Found in total 3 nodes.
89 - Checking the nodes that those aren't in another cluster.
90 - Verifying the SSH connection to the nodes.
91 - Check SELinux statuses
92 - Detected that skip_name_resolve is not used on the target server(s).
93 - Granting the controller on the cluster.
94 - Node is Synced : 10.40.191.171
95 - Node is Synced : 10.40.193.69
96 - Node is Synced : 10.40.192.75
97 - Detecting the OS.
Job failed.**
Please see attached files for admin cluster jobs logs.
Regards,
Cindy
Ahmad,
The error indicates that it couldn't SSH to the controller host (192.168.0.6). Run following command on ClusterControl host:
ssh-copy-id root@192.168.0.6
ClusterControl also needs to have passwordless SSH setup to all managed nodes, including itself.
Regards,
Ashraf
Hi Mmin,
I'll create a ticket for this as it requires us to have a look at configuration files and other log files of your ClusterControl instance.
Let's continue there.
Best regards,
Art
Printed mugs Click "Add Cluster" and after a while the Cluster will appear, or else a "Job failed" will be shown on the screen and you can investigate what the problem is.
Hi Johan,
I have deleted all nodes of a cluster. Thereafter, the cluster repaired.
Then, the cluster "Add Existing Cluster" reinserted.
Now I have the cluster in the Cluster Control 2 times indoors.
Where can I delete the old one?
Regards
Andreas
Hey Art,
Thanks for the info; what happened was I installed CC, upgraded it after a few days, and the error happened when I tried to add the cluster. I resolved it by redeploying the server from scratch, so was uanble to test your recommendation. Nonetheless, thanks for your help.
Hi Brandon,
It looks like you have an issue with the communication between the UI and the cmon rpc.
Did you recently upgrade clustercontrol? If so, can you try if it works with another browser? Sometimes a Javascript get cached aggressively by a browser and a newer version isn't picked up.
Let me know if this works.
Best regards,
Art
Hello there,
I am trying to add a cluster and I am getting the error "System failed to initiate a job command. You may want to try again.
[object Object]" - no errors in cmon.log. Can you let me know how I can debug this? I can ssh to cc itself, and all the servers.
did a fresh install after having made some many changes while on the call with Art. Discovered while installing the cluster control packages that cmon is already updated to version 1.2.12.1117 in the repo. So no need to manually synchronise the cluster as with the latest packages the entire installation and the add existing cluster works just fine.
Thanks a lot Art and of course the development team in the background. Excellent work!
Just completed the first troubleshooting session. It turns out that cmon detects the OS by catching first the entries defined in: /etc/system-release and in case system-release doesn't exist by looking for /etc/redhat-release. Since all the server run on Oracle Linux, cmon can't detect the OS properly as /etc/system-release doesn't contain any string containing "Redhat". Art got development releasing a new version 1.2.12.1117 which uses first /etc/redhat-release to discover the OS. This patch resolves this issue and the "Add external cluster" completes just fine.
Once the job is completed, the cluster needs to be synchronized first. This is done by going into the clustercontroler web gui. Navigate to. "Settings" - "Cluster Registrations". A new windows appears and there click on: "Synchronize Clusters" (it's on the top, right-hand corner - opposite the title: Cluster Registration).
sure, thanks a lot.
25 - Message sent to controller
26 - Verifying controller host and cmon password.
27 - Verifying the SSH access to the controller.
28 - Could not SSH to controller host (192.168.0.6): libssh auth error: Access denied. Authentication that can continue: publickey,gssapi-keyex,gssapi-with-mic,password (root, key=/root/.ssh/id_rsa)
Job failed.
Im also facing the same issue like this, following up to the solution above same as CIndy, i already allow the port 22 and change the permission for the key but the error still occur.
May i know what others thing that i suppose to do ?
Thanks.
No, unfortunately not.
And there's no cluster visible at url: https://localhost/clustercontrol/admin/#g:clusters...
Hi Mmin,
In the Job output you sent everything looks fine. Are there any messages coming after 37?
Best regards,
Art
Please sign in to leave a comment.