Add Existing Cluster

Follow

Comments

70 comments

  • Avatar
    Johan

    Hi Cindy,

    can you create a support ticket on  http://support.severalnines.com/tickets/new  and then we could perhaps have a remote session to debug this? 

    Best regards

    Johan 

  • Avatar
    Cindy F. Ferrer

    Hi Johan,

    I'm all set! Here is Alex resolution. These were the changes that were made.

    • Upgraded to the latest frontend + backend builds of ClusterControl 1.2.9

    • Enabled passwordless sudo for the 'cmon' os user for all db nodes.

    • Changed wsrep_cluster_address=gcomm://ip1,ip2,ip3?pc.wait_prim=no to wsrep_cluster_address=gcomm://ip1,ip2,ip3 prim.wait_prim=no can only be used when the SST method is mysqldump while we have rsync set.

      Also ClusterControl supports bootstrapping a Galera cluster (and handles cluster/node recovery) so that option is no longer needed either.

    • Copied /etc/my.cnf.d/cluster.cnf to /etc/my.cnf instead.

     

    Regards,

    Cindy

     

  • Avatar
    Johan

    Hi Cindy!

    Thanks for updating this forum article with this information! 

    Much appreciated!

    Regarding the "/etc/my.cnf.d/cluster.cnf", in version 1.2.10 we will be able to handle configuration files in different locations than /etc/mysql/my.cnf and /etc/my.cnf  ( however, in 1.2.9  "/etc/my.cnf.d/server.cnf" is also supported).

    Have a nice weekend.

    BR

    johan

  • Avatar
    Daniel McDermott

    I'm using the following user 'darkwingduck' to gain access to each of the servers in the cluster however it doesn't appear to be allowing accessing via the credentials i added:

     

    4: Could not SSH to controller host (10.1.1.220): libssh auth error: Access denied. Authentication that can continue: publickey,gssapi-keyex,gssapi-with-mic,password (darkwingduck, key=/home/darkwingduck/.ssh/id_rsa)

    3: Verifying the SSH access to the controller.

    2: Verifying controller host and cmon password.

    1: Message sent to controller

     

    My port number is non-standard and I'm scratching my head about what to do next...

  • Avatar
    Ashraf Sharif

    Hi Daniel,

    It sounds like you haven't setup passwordless SSH to the controller host. On the ClusterControl host, as darkwingduck user, please do:

    ssh-copy-id -i /home/darkwingduck/.ssh/id_rsa darkwingduck@10.1.1.220

     

    Details on this is explained here: http://www.severalnines.com/ccg/28-passwordless-ssh

    Regards,

    Ashraf

  • Avatar
    Ahmad

    25 - Message sent to controller

    26 - Verifying controller host and cmon password.

    27 - Verifying the SSH access to the controller.

    28 - Could not SSH to controller host (192.168.0.6): libssh auth error: Access denied. Authentication that can continue: publickey,gssapi-keyex,gssapi-with-mic,password (root, key=/root/.ssh/id_rsa)

    Job failed.

    Im also facing the same issue like this, following up to the solution above same as CIndy, i already allow the port 22 and change the permission for the key but the error still occur.

    May i know what others thing that i suppose to do ?

    Thanks.

  • Avatar
    Ashraf Sharif

    Ahmad,

    The error indicates that it couldn't SSH to the controller host (192.168.0.6). Run following command on ClusterControl host:

    ssh-copy-id root@192.168.0.6

    ClusterControl also needs to have passwordless SSH setup to all managed nodes, including itself.

    Regards,

    Ashraf

  • Avatar
    Cindy F. Ferrer

    Hi,

    I restarted testing this application on different test environment. After confirming all prerequisites are properly working, I am now adding the existing cluster. I created separate mysql user instead of root. Problem is the job is failing without detailed reason. What is the possible caused of this? 

    Using cmon user with proper grants. 

    54 - Message sent to controller

    55 - Verifying controller host and cmon password.

    56 - Verifying the SSH access to the controller.

    57 - Verifying job parameters.

    58 - Verifying the SSH connection to 10.40.191.171.

    59 - Verifying the MySQL user/password.

    60 - monitored_mysql_root_password is not set, please set it later the generated cmon.cnf

    61 - Getting node list from the MySQL server.

    62 - Found node: '10.40.191.171'

    63 - Found node: '10.40.193.69'

    64 - Found node: '10.40.192.75'

    65 - Found in total 3 nodes.

    66 - Checking the nodes that those aren't in another cluster.

    67 - Verifying the SSH connection to the nodes.

    68 - Check SELinux statuses

    69 - Detected that skip_name_resolve is not used on the target server(s).

    70 - Granting the controller on the cluster.

    71 - Node is Synced : 10.40.191.171

    72 - Node is Synced : 10.40.193.69

    73 - Node is Synced : 10.40.192.75

    Job failed.

    And this result when I use root mysql user.

    **

    78 - Message sent to controller

    79 - Verifying controller host and cmon password.

    80 - Verifying the SSH access to the controller.

    81 - Verifying job parameters.

    82 - Verifying the SSH connection to 10.40.191.171.

    83 - Verifying the MySQL user/password.

    84 - Getting node list from the MySQL server.

    85 - Found node: '10.40.191.171'

    86 - Found node: '10.40.193.69'

    87 - Found node: '10.40.192.75'

    88 - Found in total 3 nodes.

    89 - Checking the nodes that those aren't in another cluster.

    90 - Verifying the SSH connection to the nodes.

    91 - Check SELinux statuses

    92 - Detected that skip_name_resolve is not used on the target server(s).

    93 - Granting the controller on the cluster.

    94 - Node is Synced : 10.40.191.171

    95 - Node is Synced : 10.40.193.69

    96 - Node is Synced : 10.40.192.75

    97 - Detecting the OS.

    Job failed.**

    Please see attached files for admin cluster jobs logs.

    Regards,

    Cindy

     

  • Avatar
    Krzysztof Ksiazek

    Cindy,

    Can you please check the details of the Admin -> Cluster Jobs -> Failed job? They may give some more insight into what happened than the messages in the progress window (and by looking at what you pasted it looks to me like those are logs from the progress window). It's especially true when there are issues with credentials.

    Thanks,

    Krzysztof

  • Avatar
    Cindy F. Ferrer

    i Krzysztof,

    Here it is. I can successfully ssh 10.40.123.67 to 10.40.191.171 using the cmon user. datadir also defined in the config file. Do I need outbound ssh connection from cluster nodes to controller? (vice versa ssh connection)

    124: Connection failed from 10.40.123.67 to 10.40.191.171: can't determine the datadir on '10.40.191.171:3306' connect, failed for user 'cmon': Can't connect to MySQL server on '10.40.191.171' (4) (errno: 2003)

    123: Checking connectivity and determining the datadir on the MySQL nodes.

    122: Detected OS = 'redhat'

    121: Detecting the OS.

    120: Node is Synced : 10.40.192.75

    119: Node is Synced : 10.40.193.69

    118: Node is Synced : 10.40.191.171

    117: Granting the controller on the cluster.

    116: Detected that skip_name_resolve is not used on the target server(s).

    115: Check SELinux statuses

    114: Verifying the SSH connection to the nodes.

    113: Checking the nodes that those aren't in another cluster.

    112: Found in total 3 nodes.

    111: Found node: '10.40.192.75'

    110: Found node: '10.40.193.69'

    109: Found node: '10.40.191.171'

    108: Getting node list from the MySQL server.

    107: monitored_mysql_root_password is not set, please set it later the generated cmon.cnf

    106: Verifying the MySQL user/password.

    Thanks,

    Cindy

  • Avatar
    Cindy F. Ferrer

    I have already run this command.

    MariaDB [(none)]> GRANT ALL ON*.*TO 'cmon'@'10.40.123.67' IDENTIFIED BY '<password>' WITH GRANT OPTION;

    Query OK, 0 rows affected (0.00 sec)

    MariaDB [(none)]> flush privileges;

    Query OK, 0 rows affected (0.01 sec)

  • Avatar
    Krzysztof Ksiazek

    Cindy,

    This error:

    124: Connection failed from 10.40.123.67 to 10.40.191.171: can't determine the datadir on '10.40.191.171:3306' connect, failed for user 'cmon': Can't connect to MySQL server on '10.40.191.171' (4) (errno: 2003)

    is related to the MySQL connectivity. Are you sure that you can connect from 10.40.123.67 to 10.40.191.171 on port 3306? Error seems to indicate it's not possible.

     

    Thanks,

    Krzysztof

  • Avatar
    Cindy F. Ferrer

    Krzysztof,

    Please see attached screenshot of connecting to 171 from controller (.67) using cmon user. 

    Thanks,

    Cindy

  • Avatar
    Krzysztof Ksiazek

    Cindy,

    I'm not talking about ssh. The problem, if exists, is with regards to MySQL connectivity. Can you run "telnet 10.40.191.171 3306" from 10.40.123.67 host?

    Thanks,

    Krzysztof

  • Avatar
    chris

    Hi,

    Can someone help me please i'm new to this cluster control setup. i run the  apt-get install mariadb-client mariadb-galera-server-5.5 percona-toolkit percona-xtrabackup manually then its works fine but as soon as i run threw the cluster control its fails i think its overrides my source.list file on ubuntu 12.04 LTS 

    Here is my error message:

    34: Setting up the first server failed, aborting

    33: clusterwb1: Setup server failure, see the previous msgs.

    32: clusterwb1: failed apt-get install mariadb-client mariadb-galera-server-5.5 percona-toolkit percona-xtrabackup exited with 100: E: Unable to correct problems, you have held broken packages.

    31: clusterwb1: Installing MariaDB-5.5 debian

    30: clusterwb1: Using External repositories.

    29: clusterwb1: Installing the MySQL packages.

    28: clusterwb1: Prepare MySQL environment (user/group)

    27: clusterwb1: Removing old MySQL packages from the host.

    26: clusterwb1: Detected free disk space: 55701 MB

    25: clusterwb1: Checking free-disk space of /var/lib/mysql

    24: clusterwb1: Detected memory: 1467 MB

    23: clusterwb1: Detecting total memory.

    22: clusterwb1: Detected CPU cores: 2

    21: clusterwb1: Detecting number of CPU cores/threads.

    20: Verifying helper packages (checking if 'socat' is installed successfully).

    19: clusterwb1: Installation report of helper packages: ok: psmisc ok: rsync ok: libaio1 ok: netcat ok: netcat-openbsd ok: socat ok: lsb-release ok: libssl0.9.8 ok: libssl1.0.0 ok: libdbd-mysql-perl ok: wget ok: curl ok: pigz

    18: clusterwb1: Installing helper packages.

    17: clusterwb1: Setting vm.swappiness = 1

    16: clusterwb1: Tuning OS parameters.

    15: clusterwb1: Keeping existing firewall settings.

    14: clusterwb1: Detecting OS

    13: clusterwb1: Checking SELinux status.

    12: Using sudo_password.

    11: clusterwb1: Verifying sudo on the server.

    10: Verifying the SSH connection to clusterwb1.

    9: clusterwb1: Checking if host is already exist in other cluster.

    8: Checking job parameters.

    7: create_cluster: calling job: setupServer(clusterwb1).

    6: Testing SSH to 192.168.11.38.

    5: Testing SSH to clusterwb3.

    4: Testing SSH to clusterwb2.

    3: Testing SSH to clusterwb1.

    2: Verifying job parameters.

    1: Message sent to controller

     

  • Avatar
    Krzysztof Ksiazek

    Hi,

    Can you please paste here the output of:

    dpkg --get-selections | grep hold

    Thanks,

    Krzysztof

  • Avatar
    chris

    Krzysztof,

    I get no output on my testing server nothing is holding the package i tried to install then run it again and hold or unhold package gives me this error

    456 - Message sent to controller

    457 - Verifying job parameters.

    458 - Testing SSH to clusterwb1.

    459 - Testing SSH to clusterwb2.

    460 - Testing SSH to clusterwb3.

    461 - Testing SSH to 192.168.11.38.

    462 - create_cluster: calling job: setupServer(clusterwb1).

    463 - Checking job parameters.

    464 - clusterwb1: Checking if host is already exist in other cluster.

    465 - Verifying the SSH connection to clusterwb1.

    466 - clusterwb1: Verifying sudo on the server.

    467 - Using sudo_password.

    468 - clusterwb1: Checking SELinux status.

    469 - clusterwb1: Detecting OS

    470 - clusterwb1: There is a mysqld server running. It must be uninstalled first, or you can also add it to ClusterControl.

    471 - Setting up the first server failed, aborting

    Thanks,

    Chris

  • Avatar
    Krzysztof Ksiazek

    Hi,

    This error is expected - if you asked CC to provision a host where MySQL is already running, it will stop the process in case it's a mistake - we don't want to cause any harm.

    I'm not sure what exactly happened here, though - have you installed the MariaDB node manually? Can it be stopped? If yes, you can try two things. First - you can stop the MariaDB and then restart the job to create a new cluster from CC UI. If it fails with the previous error (Unable to correct problems, you have held broken packages), you can try to setup MariaDB node manually and then do "Add existing server/cluster" from CC UI.

    If the problem persists, I'd suggest to open a ticket with us and we can try to look at what's going on using some kind of screenshare session (Teamviewer or Join.me).

    Thanks,

    Krzysztof

  • Avatar
    chris

    Hi,

    Its very strange to me ass well but ill setup one node manually then can i add the rest  "Add existing server/cluster" from CC UI or should i setup all of them

    Thanks,

    Chris

  • Avatar
    Krzysztof Ksiazek

    Hi,

    Once you setup this single node and then add it as "Add existing server/cluster" a new cluster will show up in CC UI. You can enter this cluster and from within do "Add node". You'll have two options - you can either provision it from scratch or add existing one, when node is a part of the cluster. I'd suggest to try to setup a new one. If you'll see the similar problem as with the first one, please open a ticket with us. While it's possible to setup everything manually and then add those nodes to the CC, it's not how we'd like it to work. If you can't provision a node from scratch, it's either a problem in your particular setup or a bug in CC. We'd like to learn what's going on no matter what's the culprit - we'd like to try to fix a bug or find a workaround, whatever is causing problems..

    Thanks,

    Krzysztof

  • Avatar
    chris

    Hi,

    I got the same output error (Unable to correct problems, you have held broken packages) what you suggested. on my previous question should i setup complete clustering and then add one manually when I'm done. why i ask is that i got  a limitation on testing hardware.

     

    Thanks

    Chris

  • Avatar
    chris

    Hi,

    Its working when i setup MariaDB manually and then add to "Add existing server/cluster" from CC UI.

    Just quick small question i already got Loadbalancer setup and my plan initially is to use MariaDB Galera with 3 nodes and cluster control. so how do i add the other nodes to use MariaDB Galera clustering do i just click add node. 

    Thanks

    Chris

  • Avatar
    Johan

    HI,

    What load balancer is it?

    BR

    johan

  • Avatar
    chris

    Hi,

    I use haproxy for my webserver i saw on the UI you can add existing loadbalancer will it work?

    Thanks,

    Chris.

  • Avatar
    Johan

    hi,

    yes, HAProxy is supported. 

    BR

    johan

  • Avatar
    Petr Hrabal

    Hi,

    I hit strange error when adding standalone MySQL

    77 - Message sent to controller

    78 - Verifying controller host and cmon password.

    79 - Verifying the SSH access to the controller.

    80 - Verifying job parameters.

    81 - Verifying the SSH connection to <IP ommited>.

    82 - Verifying SUDO on <IP ommited>.

    83 - Passwordless sudo for user 'root' is not available on the host <IP ommited>: sudo error, retval: 127, output: 'sh: 1: sudo: not found'

    Job failed.

     

    I dont realy understand why is the script atempting to test "sudo" when logging in as root... any advice?

     

  • Avatar
    Krzysztof Ksiazek

    Hi,

    We'll take a look at this. It may take a while as some internal syncing with developers may be needed. We'll keep you posted as soon as we come up with some solution.

    Thanks,

    Krzysztof

  • Avatar
    Johan

    Hi,

    Assuming you are on 1.2.11 already:

    Please do (as root/sudo):

    service cmon stop

    debian/ubuntu:

    apt-get update && apt-get install clustercontrol-controller clustercontrol clustercontrol-cmonapi

    redhat/centos:

    yum clean all; yum install clustercontrol-controller clustercontrol clustercontrol-cmonapi

    Then do:

    service cmon start

    You should now have:

    Controller:  1.2.11-998

    U
    I:   1.2.11-842

    Many thanks for this report!

    BR

    Johan

  • Avatar
    Petr Hrabal

    Hi,

    its working fine after update,

    thanks for quick assistance

    BR

    Petr

  • Avatar
    Min Huber

    Hello,

    I don't understand why the add cluster doesn't work on the installation I've made. All servers running on redhat 7. Galera cluster with mariadb 10.1.11 on 3 nodes running galera-3-25. Cluster control controller running with cmon, version 1.2.12.1111.

    Job output is:

    1 Message sent to controller

    2 Verifying controller host and cmon password.

    3 Checking the SSH access to the controller (10.52.202.116)

    4 Checking ssh/sudo on 1 hosts.

    5 10.52.202.116: access with ssh/sudo granted

    6 All 1 hosts are ok.

    7 Verifying job parameters.

    8 Checking ssh/sudo on 1 hosts.

    9 10.52.207.81: access with ssh/sudo granted

    10 All 1 hosts are ok.

    11 10.52.207.81: Verifying the MySQL user/password.

    12 10.52.207.81: Getting node list from the MySQL server.

    13 Found node: '10.52.207.81'

    14 Found node: '10.52.207.42'

    15 Found node: '10.52.207.59'

    16 Found in total 3 nodes.

    17 Checking that nodes are not in another cluster.

    18 Checking ssh/sudo on 3 hosts.

    19 10.52.207.81: already checked 1 seconds ago.

    20 10.52.207.42: access with ssh/sudo granted

    21 10.52.207.59: access with ssh/sudo granted

    22 All 3 hosts are ok.

    23 Check SELinux statuses

    24 Detected that skip_name_resolve is used on the target server(s).

    25 Granting the controller on the cluster.

    26 granting addresses: 10.52.202.116

    27 10.52.207.81: Node is synced.

    28 10.52.207.42: Node is synced.

    29 10.52.207.59: Node is synced.

    30 Connected succesfully: 'cmon@10.52.202.116' -> '10.52.207.81:3306'.

    31 MySQL[10.52.207.81] datadir: /var/lib/mysql/

    32 Connected succesfully: 'cmon@10.52.202.116' -> '10.52.207.42:3306'.

    33 MySQL[10.52.207.42] datadir: /var/lib/mysql/

    34 Connected succesfully: 'cmon@10.52.202.116' -> '10.52.207.59:3306'.

    35 MySQL[10.52.207.59] datadir: /var/lib/mysql/

    36 Detecting the OS.

    37

    Any ideas why?

    Thanks and regards,

    Mmin

Please sign in to leave a comment.

Powered by Zendesk