Add Existing Cluster

Follow

Comments

71 comments

  • Avatar
    chris

    Hi,

    Can someone help me please i'm new to this cluster control setup. i run the  apt-get install mariadb-client mariadb-galera-server-5.5 percona-toolkit percona-xtrabackup manually then its works fine but as soon as i run threw the cluster control its fails i think its overrides my source.list file on ubuntu 12.04 LTS 

    Here is my error message:

    34: Setting up the first server failed, aborting

    33: clusterwb1: Setup server failure, see the previous msgs.

    32: clusterwb1: failed apt-get install mariadb-client mariadb-galera-server-5.5 percona-toolkit percona-xtrabackup exited with 100: E: Unable to correct problems, you have held broken packages.

    31: clusterwb1: Installing MariaDB-5.5 debian

    30: clusterwb1: Using External repositories.

    29: clusterwb1: Installing the MySQL packages.

    28: clusterwb1: Prepare MySQL environment (user/group)

    27: clusterwb1: Removing old MySQL packages from the host.

    26: clusterwb1: Detected free disk space: 55701 MB

    25: clusterwb1: Checking free-disk space of /var/lib/mysql

    24: clusterwb1: Detected memory: 1467 MB

    23: clusterwb1: Detecting total memory.

    22: clusterwb1: Detected CPU cores: 2

    21: clusterwb1: Detecting number of CPU cores/threads.

    20: Verifying helper packages (checking if 'socat' is installed successfully).

    19: clusterwb1: Installation report of helper packages: ok: psmisc ok: rsync ok: libaio1 ok: netcat ok: netcat-openbsd ok: socat ok: lsb-release ok: libssl0.9.8 ok: libssl1.0.0 ok: libdbd-mysql-perl ok: wget ok: curl ok: pigz

    18: clusterwb1: Installing helper packages.

    17: clusterwb1: Setting vm.swappiness = 1

    16: clusterwb1: Tuning OS parameters.

    15: clusterwb1: Keeping existing firewall settings.

    14: clusterwb1: Detecting OS

    13: clusterwb1: Checking SELinux status.

    12: Using sudo_password.

    11: clusterwb1: Verifying sudo on the server.

    10: Verifying the SSH connection to clusterwb1.

    9: clusterwb1: Checking if host is already exist in other cluster.

    8: Checking job parameters.

    7: create_cluster: calling job: setupServer(clusterwb1).

    6: Testing SSH to 192.168.11.38.

    5: Testing SSH to clusterwb3.

    4: Testing SSH to clusterwb2.

    3: Testing SSH to clusterwb1.

    2: Verifying job parameters.

    1: Message sent to controller

     

    0
    Comment actions Permalink
  • Avatar
    Johan

    HI,

    What load balancer is it?

    BR

    johan

    0
    Comment actions Permalink
  • Avatar
    chris

    Hi,

    Its working when i setup MariaDB manually and then add to "Add existing server/cluster" from CC UI.

    Just quick small question i already got Loadbalancer setup and my plan initially is to use MariaDB Galera with 3 nodes and cluster control. so how do i add the other nodes to use MariaDB Galera clustering do i just click add node. 

    Thanks

    Chris

    0
    Comment actions Permalink
  • Avatar
    chris

    Hi,

    I got the same output error (Unable to correct problems, you have held broken packages) what you suggested. on my previous question should i setup complete clustering and then add one manually when I'm done. why i ask is that i got  a limitation on testing hardware.

     

    Thanks

    Chris

    0
    Comment actions Permalink
  • Avatar
    Krzysztof Ksiazek

    Hi,

    Once you setup this single node and then add it as "Add existing server/cluster" a new cluster will show up in CC UI. You can enter this cluster and from within do "Add node". You'll have two options - you can either provision it from scratch or add existing one, when node is a part of the cluster. I'd suggest to try to setup a new one. If you'll see the similar problem as with the first one, please open a ticket with us. While it's possible to setup everything manually and then add those nodes to the CC, it's not how we'd like it to work. If you can't provision a node from scratch, it's either a problem in your particular setup or a bug in CC. We'd like to learn what's going on no matter what's the culprit - we'd like to try to fix a bug or find a workaround, whatever is causing problems..

    Thanks,

    Krzysztof

    0
    Comment actions Permalink
  • Avatar
    chris

    Hi,

    Its very strange to me ass well but ill setup one node manually then can i add the rest  "Add existing server/cluster" from CC UI or should i setup all of them

    Thanks,

    Chris

    0
    Comment actions Permalink
  • Avatar
    Krzysztof Ksiazek

    Hi,

    This error is expected - if you asked CC to provision a host where MySQL is already running, it will stop the process in case it's a mistake - we don't want to cause any harm.

    I'm not sure what exactly happened here, though - have you installed the MariaDB node manually? Can it be stopped? If yes, you can try two things. First - you can stop the MariaDB and then restart the job to create a new cluster from CC UI. If it fails with the previous error (Unable to correct problems, you have held broken packages), you can try to setup MariaDB node manually and then do "Add existing server/cluster" from CC UI.

    If the problem persists, I'd suggest to open a ticket with us and we can try to look at what's going on using some kind of screenshare session (Teamviewer or Join.me).

    Thanks,

    Krzysztof

    0
    Comment actions Permalink
  • Avatar
    chris

    Krzysztof,

    I get no output on my testing server nothing is holding the package i tried to install then run it again and hold or unhold package gives me this error

    456 - Message sent to controller

    457 - Verifying job parameters.

    458 - Testing SSH to clusterwb1.

    459 - Testing SSH to clusterwb2.

    460 - Testing SSH to clusterwb3.

    461 - Testing SSH to 192.168.11.38.

    462 - create_cluster: calling job: setupServer(clusterwb1).

    463 - Checking job parameters.

    464 - clusterwb1: Checking if host is already exist in other cluster.

    465 - Verifying the SSH connection to clusterwb1.

    466 - clusterwb1: Verifying sudo on the server.

    467 - Using sudo_password.

    468 - clusterwb1: Checking SELinux status.

    469 - clusterwb1: Detecting OS

    470 - clusterwb1: There is a mysqld server running. It must be uninstalled first, or you can also add it to ClusterControl.

    471 - Setting up the first server failed, aborting

    Thanks,

    Chris

    0
    Comment actions Permalink
  • Avatar
    Krzysztof Ksiazek

    Hi,

    Can you please paste here the output of:

    dpkg --get-selections | grep hold

    Thanks,

    Krzysztof

    0
    Comment actions Permalink
  • Avatar
    chris

    Hi,

    I use haproxy for my webserver i saw on the UI you can add existing loadbalancer will it work?

    Thanks,

    Chris.

    0
    Comment actions Permalink
  • Avatar
    Krzysztof Ksiazek

    Cindy,

    I'm not talking about ssh. The problem, if exists, is with regards to MySQL connectivity. Can you run "telnet 10.40.191.171 3306" from 10.40.123.67 host?

    Thanks,

    Krzysztof

    0
    Comment actions Permalink
  • Avatar
    Cindy F. Ferrer

    Krzysztof,

    Please see attached screenshot of connecting to 171 from controller (.67) using cmon user. 

    Thanks,

    Cindy

    0
    Comment actions Permalink
  • Avatar
    Krzysztof Ksiazek

    Cindy,

    This error:

    124: Connection failed from 10.40.123.67 to 10.40.191.171: can't determine the datadir on '10.40.191.171:3306' connect, failed for user 'cmon': Can't connect to MySQL server on '10.40.191.171' (4) (errno: 2003)

    is related to the MySQL connectivity. Are you sure that you can connect from 10.40.123.67 to 10.40.191.171 on port 3306? Error seems to indicate it's not possible.

     

    Thanks,

    Krzysztof

    0
    Comment actions Permalink
  • Avatar
    Cindy F. Ferrer

    I have already run this command.

    MariaDB [(none)]> GRANT ALL ON*.*TO 'cmon'@'10.40.123.67' IDENTIFIED BY '<password>' WITH GRANT OPTION;

    Query OK, 0 rows affected (0.00 sec)

    MariaDB [(none)]> flush privileges;

    Query OK, 0 rows affected (0.01 sec)

    0
    Comment actions Permalink
  • Avatar
    Cindy F. Ferrer

    i Krzysztof,

    Here it is. I can successfully ssh 10.40.123.67 to 10.40.191.171 using the cmon user. datadir also defined in the config file. Do I need outbound ssh connection from cluster nodes to controller? (vice versa ssh connection)

    124: Connection failed from 10.40.123.67 to 10.40.191.171: can't determine the datadir on '10.40.191.171:3306' connect, failed for user 'cmon': Can't connect to MySQL server on '10.40.191.171' (4) (errno: 2003)

    123: Checking connectivity and determining the datadir on the MySQL nodes.

    122: Detected OS = 'redhat'

    121: Detecting the OS.

    120: Node is Synced : 10.40.192.75

    119: Node is Synced : 10.40.193.69

    118: Node is Synced : 10.40.191.171

    117: Granting the controller on the cluster.

    116: Detected that skip_name_resolve is not used on the target server(s).

    115: Check SELinux statuses

    114: Verifying the SSH connection to the nodes.

    113: Checking the nodes that those aren't in another cluster.

    112: Found in total 3 nodes.

    111: Found node: '10.40.192.75'

    110: Found node: '10.40.193.69'

    109: Found node: '10.40.191.171'

    108: Getting node list from the MySQL server.

    107: monitored_mysql_root_password is not set, please set it later the generated cmon.cnf

    106: Verifying the MySQL user/password.

    Thanks,

    Cindy

    0
    Comment actions Permalink
  • Avatar
    Krzysztof Ksiazek

    Cindy,

    Can you please check the details of the Admin -> Cluster Jobs -> Failed job? They may give some more insight into what happened than the messages in the progress window (and by looking at what you pasted it looks to me like those are logs from the progress window). It's especially true when there are issues with credentials.

    Thanks,

    Krzysztof

    0
    Comment actions Permalink
  • Avatar
    Cindy F. Ferrer

    Hi,

    I restarted testing this application on different test environment. After confirming all prerequisites are properly working, I am now adding the existing cluster. I created separate mysql user instead of root. Problem is the job is failing without detailed reason. What is the possible caused of this? 

    Using cmon user with proper grants. 

    54 - Message sent to controller

    55 - Verifying controller host and cmon password.

    56 - Verifying the SSH access to the controller.

    57 - Verifying job parameters.

    58 - Verifying the SSH connection to 10.40.191.171.

    59 - Verifying the MySQL user/password.

    60 - monitored_mysql_root_password is not set, please set it later the generated cmon.cnf

    61 - Getting node list from the MySQL server.

    62 - Found node: '10.40.191.171'

    63 - Found node: '10.40.193.69'

    64 - Found node: '10.40.192.75'

    65 - Found in total 3 nodes.

    66 - Checking the nodes that those aren't in another cluster.

    67 - Verifying the SSH connection to the nodes.

    68 - Check SELinux statuses

    69 - Detected that skip_name_resolve is not used on the target server(s).

    70 - Granting the controller on the cluster.

    71 - Node is Synced : 10.40.191.171

    72 - Node is Synced : 10.40.193.69

    73 - Node is Synced : 10.40.192.75

    Job failed.

    And this result when I use root mysql user.

    **

    78 - Message sent to controller

    79 - Verifying controller host and cmon password.

    80 - Verifying the SSH access to the controller.

    81 - Verifying job parameters.

    82 - Verifying the SSH connection to 10.40.191.171.

    83 - Verifying the MySQL user/password.

    84 - Getting node list from the MySQL server.

    85 - Found node: '10.40.191.171'

    86 - Found node: '10.40.193.69'

    87 - Found node: '10.40.192.75'

    88 - Found in total 3 nodes.

    89 - Checking the nodes that those aren't in another cluster.

    90 - Verifying the SSH connection to the nodes.

    91 - Check SELinux statuses

    92 - Detected that skip_name_resolve is not used on the target server(s).

    93 - Granting the controller on the cluster.

    94 - Node is Synced : 10.40.191.171

    95 - Node is Synced : 10.40.193.69

    96 - Node is Synced : 10.40.192.75

    97 - Detecting the OS.

    Job failed.**

    Please see attached files for admin cluster jobs logs.

    Regards,

    Cindy

     

    0
    Comment actions Permalink
  • Avatar
    Ashraf Sharif

    Ahmad,

    The error indicates that it couldn't SSH to the controller host (192.168.0.6). Run following command on ClusterControl host:

    ssh-copy-id root@192.168.0.6

    ClusterControl also needs to have passwordless SSH setup to all managed nodes, including itself.

    Regards,

    Ashraf

    0
    Comment actions Permalink
  • Avatar
    Art van Scheppingen

    Hi Mmin,

    I'll create a ticket for this as it requires us to have a look at configuration files and other log files of your ClusterControl instance.

    Let's continue there.

    Best regards,

    Art

    0
    Comment actions Permalink
  • Avatar
    Ali Raza

    Printed mugs Click "Add Cluster" and after a while the Cluster will appear, or else a "Job failed" will be shown on the screen and you can investigate what the problem is.

    0
    Comment actions Permalink
  • Avatar
    Andreas Loos

    Hi Johan,
    I have deleted all nodes of a cluster. Thereafter, the cluster repaired.
    Then, the cluster "Add Existing Cluster" reinserted.
    Now I have the cluster in the Cluster Control 2 times indoors.
    Where can I delete the old one?

    Regards
    Andreas

    0
    Comment actions Permalink
  • Avatar
    Brandon Lee

    Hey Art,

    Thanks for the info; what happened was I installed CC, upgraded it after a few days, and the error happened when I tried to add the cluster.  I resolved it by redeploying the server from scratch, so was uanble to test your recommendation.  Nonetheless, thanks for your help.

    0
    Comment actions Permalink
  • Avatar
    Art van Scheppingen

    Hi Brandon,

    It looks like you have an issue with the communication between the UI  and the cmon rpc.

    Did you recently upgrade clustercontrol? If so, can you try if it works with another browser? Sometimes a Javascript get cached aggressively by a browser and a newer version isn't picked up.

    Let me know if this works.

    Best regards,

    Art

    0
    Comment actions Permalink
  • Avatar
    Brandon Lee

    Hello there,

    I am trying to add a cluster and I am getting the error "System failed to initiate a job command. You may want to try again.

    [object Object]"  - no errors in cmon.log.  Can you let me know how I can debug this?  I can ssh to cc itself, and all the servers.

    0
    Comment actions Permalink
  • Avatar
    Permanently deleted user

    did a fresh install after having made some many changes while on the call with Art. Discovered while installing the cluster control packages that cmon is already updated to version 1.2.12.1117 in the repo. So no need to manually synchronise the cluster as with the latest packages the entire installation and the add existing cluster works just fine.

    Thanks a lot Art and of course the development team in the background. Excellent work!

    0
    Comment actions Permalink
  • Avatar
    Permanently deleted user

    Just completed the first troubleshooting session. It turns out that cmon detects the OS by catching first the entries defined in: /etc/system-release and in case system-release doesn't exist by looking for /etc/redhat-release. Since all the server run on Oracle Linux, cmon can't detect the OS properly as /etc/system-release doesn't contain any string containing "Redhat". Art got development releasing a new version 1.2.12.1117 which uses first /etc/redhat-release to discover the OS. This patch resolves this issue and the "Add external cluster" completes just fine.

    Once the job is completed, the cluster needs to be synchronized first. This is done by going into the clustercontroler web gui. Navigate to. "Settings" - "Cluster Registrations". A new windows appears and there click on: "Synchronize Clusters" (it's on the top, right-hand corner - opposite the title: Cluster Registration).

    0
    Comment actions Permalink
  • Avatar
    Permanently deleted user

    sure, thanks a lot.

    0
    Comment actions Permalink
  • Avatar
    Ahmad

    25 - Message sent to controller

    26 - Verifying controller host and cmon password.

    27 - Verifying the SSH access to the controller.

    28 - Could not SSH to controller host (192.168.0.6): libssh auth error: Access denied. Authentication that can continue: publickey,gssapi-keyex,gssapi-with-mic,password (root, key=/root/.ssh/id_rsa)

    Job failed.

    Im also facing the same issue like this, following up to the solution above same as CIndy, i already allow the port 22 and change the permission for the key but the error still occur.

    May i know what others thing that i suppose to do ?

    Thanks.

    0
    Comment actions Permalink
  • Avatar
    Permanently deleted user

    No, unfortunately not.

    And there's no cluster visible at url: https://localhost/clustercontrol/admin/#g:clusters...

    0
    Comment actions Permalink
  • Avatar
    Art van Scheppingen

    Hi Mmin,

    In the Job output you sent everything looks fine. Are there any messages coming after 37?

    Best regards,

    Art

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk