Add Existing Cluster

Follow

Comments

70 comments

  • Avatar
    Cindy F. Ferrer

    Trying to add existing cluster, after configuring shhd_config and ssh keys i still getting this error:

    61 - Message sent to controller

    62 - Verifying controller host and cmon password.

    63 - Verifying the SSH access to the controller.

    64 - Could not SSH to controller host (10.40.191.7): libssh auth error: Access denied. Authentication that can continue: publickey (root, key=/root/.ssh/id_rsa)

    Job failed

    Did I missed out something? And yes, this is very basic but I cannot forward on my testing from this error. 

    0
    Comment actions Permalink
  • Avatar
    Mick S

    Hi johan

    i just add GRANT for user 'cmon'@'IP_OPENVPN_SERVER'

    0
    Comment actions Permalink
  • Avatar
    Krzysztof Ksiazek

    Hi,

    Once you setup this single node and then add it as "Add existing server/cluster" a new cluster will show up in CC UI. You can enter this cluster and from within do "Add node". You'll have two options - you can either provision it from scratch or add existing one, when node is a part of the cluster. I'd suggest to try to setup a new one. If you'll see the similar problem as with the first one, please open a ticket with us. While it's possible to setup everything manually and then add those nodes to the CC, it's not how we'd like it to work. If you can't provision a node from scratch, it's either a problem in your particular setup or a bug in CC. We'd like to learn what's going on no matter what's the culprit - we'd like to try to fix a bug or find a workaround, whatever is causing problems..

    Thanks,

    Krzysztof

    0
    Comment actions Permalink
  • Avatar
    Daniel McDermott

    I'm using the following user 'darkwingduck' to gain access to each of the servers in the cluster however it doesn't appear to be allowing accessing via the credentials i added:

     

    4: Could not SSH to controller host (10.1.1.220): libssh auth error: Access denied. Authentication that can continue: publickey,gssapi-keyex,gssapi-with-mic,password (darkwingduck, key=/home/darkwingduck/.ssh/id_rsa)

    3: Verifying the SSH access to the controller.

    2: Verifying controller host and cmon password.

    1: Message sent to controller

     

    My port number is non-standard and I'm scratching my head about what to do next...

    0
    Comment actions Permalink
  • Avatar
    Krzysztof Ksiazek

    Cindy,

    I'm not talking about ssh. The problem, if exists, is with regards to MySQL connectivity. Can you run "telnet 10.40.191.171 3306" from 10.40.123.67 host?

    Thanks,

    Krzysztof

    0
    Comment actions Permalink
  • Avatar
    Johan

    Hi,

    On the controller, can you do as user root:

    ssh -v -i /root/.ssh/id_rsa  root@10.40.191.7

    What do you get?

    BR

    johan

    0
    Comment actions Permalink
  • Avatar
    Johan

    Hi Mick,

    What did you do to fix this? I just want to understand if we did something wrong, or if we can improve somethings to make it easier.

    BR

    johan

    0
    Comment actions Permalink
  • Avatar
    Johan

    HI,

    What load balancer is it?

    BR

    johan

    0
    Comment actions Permalink
  • Avatar
    Petr Hrabal

    Hi,

    I hit strange error when adding standalone MySQL

    77 - Message sent to controller

    78 - Verifying controller host and cmon password.

    79 - Verifying the SSH access to the controller.

    80 - Verifying job parameters.

    81 - Verifying the SSH connection to <IP ommited>.

    82 - Verifying SUDO on <IP ommited>.

    83 - Passwordless sudo for user 'root' is not available on the host <IP ommited>: sudo error, retval: 127, output: 'sh: 1: sudo: not found'

    Job failed.

     

    I dont realy understand why is the script atempting to test "sudo" when logging in as root... any advice?

     

    0
    Comment actions Permalink
  • Avatar
    Art van Scheppingen

    Hi Mmin,

    In the Job output you sent everything looks fine. Are there any messages coming after 37?

    Best regards,

    Art

    0
    Comment actions Permalink
  • Avatar
    Johan

    Hi Cindy,

    I believe we know the problem now.

    This string: pc.wait_prim=no  is at the end of the wsrep_cluster_addresses and cmon fails to parse it.

    We will fix this.

     

    Thanks a lot.

     

    BR

    johan

    0
    Comment actions Permalink
  • Avatar
    Johan

    Hi Cindy,

    It should be in /etc/httpd/conf.d/ssl.conf

    Can you check there? Is this is enough info , else let me know and i will ask my colleagues for more advice on this.

    BR

    johan

    0
    Comment actions Permalink
  • Avatar
    Johan

    hi,

    yes, HAProxy is supported. 

    BR

    johan

    0
    Comment actions Permalink
  • Avatar
    Krzysztof Ksiazek

    Hi,

    We'll take a look at this. It may take a while as some internal syncing with developers may be needed. We'll keep you posted as soon as we come up with some solution.

    Thanks,

    Krzysztof

    0
    Comment actions Permalink
  • Avatar
    Min Huber

    No, unfortunately not.

    And there's no cluster visible at url: https://localhost/clustercontrol/admin/#g:clusters...

    0
    Comment actions Permalink
  • Avatar
    Andreas Loos

    Hi Johan,
    I have deleted all nodes of a cluster. Thereafter, the cluster repaired.
    Then, the cluster "Add Existing Cluster" reinserted.
    Now I have the cluster in the Cluster Control 2 times indoors.
    Where can I delete the old one?

    Regards
    Andreas

    0
    Comment actions Permalink
  • Avatar
    Cindy F. Ferrer

    Hi Johan,

    The issue is now resolved. Port 22 was closed in destination. Also, proper file permission of the keys file.

    On each nodes 

    #chmod 700 ~/.ssh/authorized_keys

    #chmod 600 ~/.ssh/id_rsa.pub

    #chmod -R go-wr ~/.ssh

    My new encountered error after 3 nodes are detected when adding existing cluster is:

    Can't SSH to 10.40.192.85?pc.wait_prim=no: hostname lookup failed: '10.40.192.85?pc.wait_prim=no'   (in one node only)

    Is it recommended to set this option to yes? Seems this is a good practice to ensure the server will start running even if it can't determine a primary node? if all members go down at the same time.

     

    Thanks,

    Cindy

     

     

    0
    Comment actions Permalink
  • Avatar
    Cindy F. Ferrer

    HI Johan,

    We want to disable SSL3. Could you tell me which conf file is the one cmon is using SSL? I can’t find any SSLProtocol configuration on cmon server.

     #vi /etc/httpd/conf.d/

    Inline image 2

    Thanks,

    Cindy

    0
    Comment actions Permalink
  • Avatar
    Johan

    Hi Cindy,

    can you create a support ticket on  http://support.severalnines.com/tickets/new  and then we could perhaps have a remote session to debug this? 

    Best regards

    Johan 

    0
    Comment actions Permalink
  • Avatar
    Petr Hrabal

    Hi,

    its working fine after update,

    thanks for quick assistance

    BR

    Petr

    0
    Comment actions Permalink
  • Avatar
    Art van Scheppingen

    Hi Mmin,

    I'll create a ticket for this as it requires us to have a look at configuration files and other log files of your ClusterControl instance.

    Let's continue there.

    Best regards,

    Art

    0
    Comment actions Permalink
  • Avatar
    Johan

    Hi Cindy,

    Can you go to (in the web browser):

    http://clustercontroladdress/clustercontrol/admin/#g:jobs

    Click on the failed job (add existing). Can you send the output from that job?

    THanks

    johan

    0
    Comment actions Permalink
  • Avatar
    Krzysztof Ksiazek

    Hi,

    This error is expected - if you asked CC to provision a host where MySQL is already running, it will stop the process in case it's a mistake - we don't want to cause any harm.

    I'm not sure what exactly happened here, though - have you installed the MariaDB node manually? Can it be stopped? If yes, you can try two things. First - you can stop the MariaDB and then restart the job to create a new cluster from CC UI. If it fails with the previous error (Unable to correct problems, you have held broken packages), you can try to setup MariaDB node manually and then do "Add existing server/cluster" from CC UI.

    If the problem persists, I'd suggest to open a ticket with us and we can try to look at what's going on using some kind of screenshare session (Teamviewer or Join.me).

    Thanks,

    Krzysztof

    0
    Comment actions Permalink
  • Avatar
    Cindy F. Ferrer

    Hi Johan,

    I'm all set! Here is Alex resolution. These were the changes that were made.

    • Upgraded to the latest frontend + backend builds of ClusterControl 1.2.9

    • Enabled passwordless sudo for the 'cmon' os user for all db nodes.

    • Changed wsrep_cluster_address=gcomm://ip1,ip2,ip3?pc.wait_prim=no to wsrep_cluster_address=gcomm://ip1,ip2,ip3 prim.wait_prim=no can only be used when the SST method is mysqldump while we have rsync set.

      Also ClusterControl supports bootstrapping a Galera cluster (and handles cluster/node recovery) so that option is no longer needed either.

    • Copied /etc/my.cnf.d/cluster.cnf to /etc/my.cnf instead.

     

    Regards,

    Cindy

     

    0
    Comment actions Permalink
  • Avatar
    Johan

    Hi,

    Assuming you are on 1.2.11 already:

    Please do (as root/sudo):

    service cmon stop

    debian/ubuntu:

    apt-get update && apt-get install clustercontrol-controller clustercontrol clustercontrol-cmonapi

    redhat/centos:

    yum clean all; yum install clustercontrol-controller clustercontrol clustercontrol-cmonapi

    Then do:

    service cmon start

    You should now have:

    Controller:  1.2.11-998

    U
    I:   1.2.11-842

    Many thanks for this report!

    BR

    Johan

    0
    Comment actions Permalink
  • Avatar
    Min Huber

    sure, thanks a lot.

    0
    Comment actions Permalink
  • Avatar
    Cindy F. Ferrer

    Here's the output from the job failed.

    179: Can't SSH to 10.40.192.85?pc.wait_prim=no: hostname lookup failed: '10.40.192.85?pc.wait_prim=no'

    178: Verifying the SSH connection to the nodes.

    177: Checking the nodes that those aren't in another cluster.

    176: Found in total 3 nodes.

    175: Found node: 10.40.192.85?pc.wait_prim=no

    174: Found node: 10.40.193.46

    173: Found node: 10.40.191.170

    172: Output: wsrep_cluster_address gcomm:// 10.40.191.170,10.40.193.46,10.40.192.85?pc.wait_prim=no SSH command: /usr/bin/mysql -u'root' -p '**********' -N -B -e "show global variables like 'wsrep_cluster_address';"

     

    For the resolution, how long do you think it will be?   

    Thanks a lot Johan! 

     

    Regards,

    Cindy

     

     

    0
    Comment actions Permalink
  • Avatar
    chris

    Hi,

    I got the same output error (Unable to correct problems, you have held broken packages) what you suggested. on my previous question should i setup complete clustering and then add one manually when I'm done. why i ask is that i got  a limitation on testing hardware.

     

    Thanks

    Chris

    0
    Comment actions Permalink
  • Avatar
    Cindy F. Ferrer

    Hi Johan,

    Thank you! Another concern is, I tried to delete the first added cluster group in the controller but when re adding it I keep on receiving this output during installation progress (Adding Existing Cluster). 

     

    Installation Progress:

    6101 - Message sent to controller

    6102 - Verifying controller host and cmon password.

    6103 - Verifying the SSH access to the controller.

    6104 - Verifying job parameters.

    6105 - Verifying the SSH connection to sgtl1db3.euwest.

    6106 - Verifying the MySQL user/password.

    6107 - Getting node list from the MySQL server.

    6108 - Found node: '10.40.192.85'

    6109 - Found node: '10.40.193.46'

    6110 - Found node: '10.40.191.170'

    6111 - Found in total 3 nodes.

    6112 - Checking the nodes that those aren't in another cluster.

    6113 - Host (10.40.192.85) is already in an other cluster.

    I only have this addresses in my config file.

    wsrep_cluster_address="gcomm://10.40.191.170,10.40.193.46,10.40.192.85?pc.wait_prim=no"

     

    Regards,

    Cindy

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Ahmad

    25 - Message sent to controller

    26 - Verifying controller host and cmon password.

    27 - Verifying the SSH access to the controller.

    28 - Could not SSH to controller host (192.168.0.6): libssh auth error: Access denied. Authentication that can continue: publickey,gssapi-keyex,gssapi-with-mic,password (root, key=/root/.ssh/id_rsa)

    Job failed.

    Im also facing the same issue like this, following up to the solution above same as CIndy, i already allow the port 22 and change the permission for the key but the error still occur.

    May i know what others thing that i suppose to do ?

    Thanks.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk