ClusterControl (CMON) on Galera Cluster not working

Comments

4 comments

  • Avatar
    Johan

    Hi Stefan,

    We have not yet verified this on percona, but can you try to do:

    edit my.cnf and set:

    cmon_core_dir=/root/s9s-galera-2.1.0/

    save and exit.

    Then do 

    mkdir -p /root/s9s-galera-2.1.0/mysql/config

    Now since this is a cluster, i trust you have the equivalent my.cnf files. The cmon conroller needs to read a my.cnf for the galera nodes in order to make some conclusions  about data dirs of the mysql servers etc. (this will go away very soon, but for now it is like this).

    Copy one of  the my.cnf to  /root/s9s-galera-2.1.0/mysql/config
    So that you have: 

    $ ls /root/s9s-galera-2.1.0/mysql/config/my.cnf
    /root/s9s-galera-2.1.0/mysql/config/my.cnf

    (please note the s9s-galera-2.1.0 directory can be called whatever, but it must have mysql/config underneath).

    Now how well the recovery works depends on how you have setup your Galera Cluster,  Most tutorials on the internet are wrong, and as i said we have never tested on Percona Xtra Cluster, but we do plan to support it quite soon. So your input is very much appreciated!

    Let us know what happens, but try the above, and then send a new set of logs.

    If you can also provide the different my.cnf files that would be great too!

    Thanks,

    Johan

    0
    Comment actions Permalink
  • Avatar
    Stefan Berger

    Hi Johan,

    thanks for your quick response !

    i have droped the database and started from scratch - have also done what you have advised 

    additional i have stopped the cmon agents on the cluster hosts

    i have also put a symlink on the cluster-hosts to the my.cnf file as they are not in /etc

    as per log it seem that this is a must

    Aug 31 16:23:01 : (INFO) ssh  -q -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -oNumberOfPasswordPrompts=0 

    -oIdentityFile=/root/.ssh/id_rsa root@10.212.11.90 "/bin/cat /etc//my.cnf" )

     

    After that i have tried to start the cmon controller

    /usr/local/cmon/sbin/cmon --config-file=/etc/cmon.cnf

    but i get a segfault => tried this several times ( ca. 7x ) and suddenly it was running

    i have attached the consoel output, syslog file has following messages

    Aug 31 16:37:45 mysql-host-control kernel: [ 1990.327832] cmon[925]: segfault at 7fc823128000 ip 0000000000429f97 sp 00007fc823125eb0 error 4 in cmon[400000+3e3000]
    Aug 31 16:39:10 mysql-host-control kernel: [ 2075.448526] cmon[1255]: segfault at 0 ip 00007f1fb71e4de3 sp 00007f1fb5770ab0 error 6 in libstdc++.so.6.0.13[7f1fb717b000+f6000]
    Aug 31 16:39:18 mysql-host-control kernel: [ 2082.959326] cmon[1374]: segfault at 7fd6a233d000 ip 0000000000429f97 sp 00007fd6a233aeb0 error 4 in cmon[400000+3e3000]
    Aug 31 16:39:20 mysql-host-control kernel: [ 2084.996815] cmon[1420]: segfault at 7f5bdb6e7000 ip 0000000000429f97 sp 00007f5bdb6e4eb0 error 4 in cmon[400000+3e3000]
    Aug 31 16:39:21 mysql-host-control kernel: [ 2086.586962] cmon[1467]: segfault at 7f586cca8000 ip 0000000000429f97 sp 00007f586cca5eb0 error 4 in cmon[400000+3e3000]
    Aug 31 16:39:23 mysql-host-control kernel: [ 2087.746793] cmon[1511]: segfault at 7fa8501c3000 ip 0000000000429f97 sp 00007fa8501c0eb0 error 4 in cmon[400000+3e3000]
    Aug 31 16:39:24 mysql-host-control kernel: [ 2088.890331] cmon[1555]: segfault at 7f451add9000 ip 0000000000429f97 sp 00007f451add6eb0 error 4 in cmon[400000+3e3000]
    Aug 31 16:39:43 mysql-host-control kernel: [ 2108.513499] cmon[1669]: segfault at 7fcbafc86000 ip 0000000000429f97 sp 00007fcbafc83eb0 error 4 in cmon[400000+3e3000]

     

    i will test this some time and then i will activate the agents on the Cluster-Hosts

     

    Thanks,

    Stefan

     

    0
    Comment actions Permalink
  • Avatar
    Stefan Berger

    i have created now core dumps

    i tooks 6 times to get cmon running ( the first 5 attempts segfaulted)

    i have tried to backtrace the core dump but i think i would need  a "debug" version of cmon with debug symbols.

    anyway i have attached the results

    If the controller is running it seems to be stable - agent communication is working too

     

    regards

    Stefan

    0
    Comment actions Permalink
  • Avatar
    Johan

    Hi,


    Thanks, i will look at them. I am currently setting up a Percona Xtradb Cluster. You are a bit on unchartered territory right now (normally people deploy using our deployment packages and then we know the infrastructure + config settings, and locations etc), but it should naturally work on an existing cluster too (and it does for MySQL Cluster, and Replication, but Galera lacks some testing in that area for now).

    best regards,

    Johan

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk